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Introduction 


I. ORDER to state a value for the minimum required 
toughness in high strength steel one must consider simultaneously 
the effects of flaw dimensions, temperature, section thickness, 
stress or strain history, and working stresses. One does not at- 
tempt to calculate in advance from first principles how much 
toughness a material will exhibit in fracturing. Thus an experi- 
mental approach must be used. From the experimental view- 
point various quantities related to service requirements might 
be measured. In the straightforward approach the measured 
quantity would be either the allowable stress in the presence of a 
stated flaw size and shape or the allowable flaw size for the 
expected service stress. On the other hand, the measured quan- 
tity might be the safe working temperature range or sheet thick- 
ness for an assumed flaw size and stress level. The general 
problem is obviously complex and a considerable amount of 
sophistication in testing and interpretation may be required. 
For example, an existing visible crack is clearly not desirable 
even though tolerable. Equally undesirable is the danger of crack 
propagation in a material which exhibits a brittle fracture ap- 
pearance even though inspection has shown no flaws of significant 
size. Both conditions are likely if we continue to tolerate some- 
thing less than the best possible workmanship and if we insist on 
pushing the tensile yield strength of the steel beyond 230,000 psi 
in large structures. 


Theoretical Considerations 


The theory presented here is a mathematical analysis of the 
macro aspects of fracturing and is selected from the writings of 
G. R. Irwin. It is assumed that all materials especially after 
fabrication into structures have flaws considerably larger than 
atomic dimensions. If failure of the structure is to result from 
applying loads, at least one and probably many of the flaws will 
start to grow slowly as the load is increased until one such flaw 
reaches a critical size at which time the speed of propagation 
suddenly increases by a large factor and structural failure results. 
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Nomenclature 


Minimum Toughness Requirements for 
High Strength Sheet Steel 


The mathematical treatment considered here centers mainly on 
the point of instability. It is assumed [1]* that the onset of in- 
stability coincides with onset of rapid fracture. 

Several important special cases are given here by result only 
since the complete derivations have been published elsewhere. 
The case of a running crack in a large pressure vessel is the 
simplest of all. For this the formula for a through the thickness 
crack is 


K? = EG, = 
(1) 
or K, = 


describes the point of fracture instability. K, is called the stress 
intensity factor andG. is the critical driving force at the beginning 
of fast fracture. At least two subtleties are contained in Equation 
(1). First the critical crack length 2a includes the slow growth 
prior to instability so that any crack revealed by inspection prior 
to load application is less than its final effective length. Second, 
the critical value 2a if observed as by motion pictures or marked 
as by ink staining is still less than the true effective length be- 
cause stress relaxation results from plastic flow at the crack tip. 
Irwin [2] has assumed that the latter correction can be stated as 


EG. 
= = 2 
2ro,? @) 
where @, is the yield strength and p is a numerical factor initially 
assumed to be 1. IfG, is corrected for this then Equation (1) be- 
comes 


= ro%(a + Aa) | 


(: ) 


&’ Numbers in brackets designate References at end of paper. 

Contributed by the Metals Engineering Division of Tae American 
Socrety or MecHanicaL ENGINEERS and presented at the ASME- 
AWS Metals Engineering Conference, Los Angeles, Calif., April 25- 
29, 1960. Manuscript received at ASME Headquarters, April 21, 
1960. This paper was not preprinted. 


K, = stress intensity factor at insta- 


2a 
2ao 
200! 
Aa 
Ge 


bility 

length of a crack 

initial slot or crack length before 
slow growth in test specimen 

initial crack length assumed in a 
motor case 

correction to a to allow for plastic 
relaxation 

critical driving force or work per 
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unit area fractured at the point 
of instability 

value of G, corrected for plasticity 

Young’s modulus 

greatest principle stress assumed 
disregarding any crack 

yield strength determined in a 
uniaxial test at 0.2 per cent 
offset 

the load divided by the holding 


area 

sheet thickness 

relative plasticity indicating the 
degree of departure from plane 
strain fracture 

density 

numerical factor relating allowa- 
ble crack size to sheet thick- 
ness 

width of test specimen 
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‘ 
| 
or 
(3) 
| i=- 
ad 
5 = p = 
Gy = t= 
= Sot = w= 


Similarly, the familiar Irwin tangent formula In Fig. 1 a plot of a;/a, versus o/¢, is given for pressure vessel. 
t o/a, is the ratio of applied stress to the uniaxial yield strength of 
FO, = ow tan na the material. It is clear from Fig. 1 that as the value of o/¢, in 
w the pressure vessels increases it becomes hazardous to neglect the 
plasticity correction. The values of o/c, in the pressure vessel 
becomes above which it is unsafe to neglect the correction are given in 
Fig. 2. These are designated For o/c, < it is 
= tan ( e+ ) conservative and unnecessarily pessimistic to neglect the correc- 
2ra,? tion. 

The values of EG.*, EG., 2a, and 2a, are given in Table 1 for 
{ (4) various values of Sae/a, assumed in the G, test. Table 1 is 

ES.* based on the assumption that 2a/w = 0.5 for the test. 
or EG.* = o%w tan | The determination of G,* rather than is a simple matter. 
2we,’ ) The graphical solution of Equation (2) leads to G.*/G. values as 
shown in Table 2 and plotted in Fig. 3. Using Fig. 3 one may ob- 


In estimating a critical crack size it is not possible to dismiss 
the plasticity correction as being self canceling. For illustration 
assume that G,* value has been obtained from testing a centrally 
4 notched sheet of width W as follows: Assume for simplicity that 18 
: in the G, test using a centrally notched specimen of width W, ' 
vas o = ¢,/3 and 2a = W/2 at instability where 2a is the critical 

(observed) crack length. By a graphical solution of Equation (4) 1.6 


1.12 
EG.* “9. Wo,* 1.4 


. Hence for a pressure vessel in which the nominal first principal 
os stress is o in the vicinity of a flaw which grows to a critical length 
2a as observed 


If in our analysis we do not make the plasticity correction, then 
from the same test 


= 2 
On 
Then 
2a; 
— = 0.5612 —- — Fig. 1 Ratios of predicted unstable crack lengths with and without the 
2a, o,? Irwin plasticity correction 


Table 1 Prediction of critical crack lengths in pressure vessels (from the tests in which ra/w = 1/4) 


With correction | Uncorrected * 
Assumption Predicted critical | | Predicted critical 
in G, test: crack in pressure | crack in pressure 
Fnet From the §. test vessel the test} vessel 
Cy ES.* 2a, EG. 2a; 
w(2ey? — | ay*w 2w o,* 
mig? on? an? 
16 l6x 16 
0.12450,2 | | Qw oy? 
9 a? 
1. 38woy? 1. 38w — Qoytw 
4 42a? 
3305 
1.15 2(0.3305)we,2 | 9-3305w | 6 .3305m0,2 | 20-3305) ,, 
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: and for the critical crack size 2a, in the pressure vessel we predict 2 10 
3 O67 
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w 
> 
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< 
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Fig. 2 Values of c/o, in a pressure vessel above which it is unsafe to 
neglect the plasticity correction 


Table 2 &.*/G- for the centrally notched sheet specimen calculated 
from = tan (= + 


2wo,? 

Fnet/ Oy — ——2a/w- 

0.3 0.4 0.5 0.6 
0 . 1.00 1.00 1.00 1.00 
0.5 1.10 1.08 1.05 1.04 
0.67 1.18 1.14 1.12 1.09 
1.00 1.55 1.45 1.40 1.40 
1.15 1.96 1.93 1.93 No solu, 


tain S.* by multiplying by the appropriate numerical factor. 
The factor is dependent on the observed crack length and stress 
level in the test and these factors are allowed for in Fig. 3. 


The Estimation of Slow Growth of Through the Thickness 
Cracks to Critical Size 


Thus far we have ignored the problem of how much a crack of 
initial length 2a will slowly grow in reaching the critical crack 


02 
user AT INSTABILITY Yc TEST 


04 06 08 10 12 


Fig. 3 Chart for use in directly converting uncorrected G. to S.* 


length 2a. The foregoing considerations applied only to critical 
crack length. 

The prediction of slow crack growth cannot be done for the 
general case on the basis of present knowledge. Under the special 
conditions of G, testing with 3 in. wide centrally notched specimens 
we have the results for seven steels. 

Fig. 4 is a plot of a/ao, determined by ink staining, as a function 
of EG.*/cy*. Except for per cent shear in fracture appearance 
below 20 per cent the average growth was by a factor of about 1.6. 
Fig. 5 is a plot of a/ao versus per cent shear only. Fig. 5(a) shows 
the effect of specimen width. The scatter in results precludes 
attributing a very significant dependence on either specimen 
thickness, per cent shear, or EG.*/ay*. For fractures below 20 
per cent shear in appearance, an average ratio a/ap = 1.2 would 
be more appropriate. Thus far it appears that a/ap is independent 
of specimen width. If a fixed rule or formula for estimating slow 
growth is adopted such as is proposed here a restriction in the 
scatter must naturally result. If we are interested only in 
average or most likely values then estimation of slow crack 
growth without ink staining is justified. It does not necessarily 
follow that such arbitrary restriction of the scatter in calculated 
results best represents the true behavior of the steels. See Appen- 
dix 1. 


4 
@ LADISH D6A 
AMS 6434 
4340 
AMS 255 
TRICENT 
10 
0 02 04 #406 10 i214 16 20 22 24 
E& 


Fig. 4 Plot of slow crack growth in G. test specimens as determined by ink staining 
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PERCENT SHEAR IN FAST FRACTURE 


Fig. 5 a/ay as a function of per cent shear in 3-in-wide specimens of all steels tested 


0.092" THICK TRICENT (300 m ) 


ALL SPECIMENS SCALED 2/16/60 
29 = 0.25 W (SMITH & ANDERSON) 
LENGTH = 3W 
ey * 230,000 PSI 
ELOX NOTCHES 
5 
2 3.2}- 100% SHEAR ell 
+/z 
Six 24 o 
4 
« 
2 4 6 10 12 14 


W-(SPECIMEN WIDTH- INCHES) 
7 Fig. 5(a) o/ay) as a function of specimen width. All specimens exhibited 100 per cent shear. 


For purposes of simplicity assume each natural crack has a 60 _ vessels in which we do not know and preferably would never know 
per cent slow growth prior to unstable rapid fracture. Thenina directly the fracture appearance. Therefore a simple multiplying 


test factor on a would be readily useful in predicting failure stresses 
in the presence of prior cracks of length 2ay as measured by nonde- 
wifes FO.* structive techniques. Ink staining of through cracks in pressure 
BG.* = o%w tan [ + 3B | (5) vessels afterward pressurized up to failure has not been done. We 


therefore tentatively adopt the same rule as above: That a/ao will 
be on the average 1.6, subject to future revision. Writing Equa- 
We are interested in predicting the slow crack growth in pressure _ tion (3) as 


4 / MARCH 1961 Transactions of the ASME 


. 
os 
+. 150 ee - . 
; 
7 
4 . 
1.30 / 
ute 
. / 
h 
. / 
| 
| 
i 
‘ 
4 


° 
A 
80 cc 6 
ux 
70 
coe on x KEY 
x- 4340 |" 
ox o- 4335.1" 
LADISH D6A 
v= VASCOJET 1000 
¥ sol ‘6 VASCOJET 1000 05" 
a P- PEERLESS 56 10° 
oe MBMC 05° 
6434 10° 
40K @- 6434 180° 
2- 6434 270° 
> 3- 6434 345 
T-TRICENT AIR MELT O7" (AVERAGE OF FOUR VALUES) 
Wr ww 4- TRICENT VACUUM MELT " 
oat. 5- TRICENT AIR MELT 220° " 
2 6- TRICENT VACUUM MELT 220° " 
20-3 2 A- 6434 AIR MELT 220° “ 
25 8- 6434 VACUUM MELT 220° “ 
33 C- WELDED 6434 AIR MELT 210° " 
5 
ge om 
5 yr 
2 6 10 12 14 


B 


Fig. 6 Observed percentage shear as a function of 


71 
(1 ~ ) 


we can now determine the minimum toughness requirement. For 
most G, test specimens the critical crack 2a is about w/2. Using 
this assumption and placing the restriction that the pressure vessel 
must not fail at a stress less than the tensile yield strength o, of 
the steel we proceed to plan a suitable G. test. We can in theory 
select a specimen of such width w that at failure the average net 
section stress is the yield strength of the steel. Thus for the test 


4 
For the test 
tan + BS. = 1.4 
4° 2wo,? 
and 


BG.* = 0,2(0.35)w 


For the pressure vessel let ao’ be the initial half length of a crack 
before stressing 


1.67 (2a9’) = 0.35w 


K.* = = 0,7(0.35)w 


w => 14.4(2a9’) 


This means that if prior through cracks of length 2ao’ are 
present in the pressure vessel than the vessel will not fail below 
the yield strength if + net section stress of at least o, is reached 
in aG, test specimen of width 14.4 (2ao") or greater. 

There has been a tendency to concede that cracks of length 
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2a’ S 2t will not always be avoided; also there is a tendency to 
emphasize the achievement of yield strength in both pressure 
vessels and test coupons. This would be a compatible basis for 
acceptance if 


W 2 14.4(2t) or w 2 290. 


For steel 0.1-in. thick or less the currently used 3-in-wide speci- 
men is therefore adequate. The assumption of 2a/w = 0.5 is 
not unreasonable, however, if other values of 2a/w are assumed 
the relations are as shown in Table 3. 


Table 3 Minimum specimen widths 


2a/w in Ge w tmax for w = 3 
test minimum inches 
0.3 0.115 in. 
0.4 26. 5t 0.113 in. 
0.5 0.103 in. 
0.6 34t 0.088 in. 


*¢ = plate thickness 


Unless the width of the G, test piece equals or exceeds the 
values in Table 3 the achievement of Gace: = , in the test does not 
guarantee that prior cracks 2¢ in length will be stable in the pres- 
sure vessels at ¢ = ¢,. This takes into account both the ex- 
pected slow growth and the plasticity correction. Provided all 
of the requirements are met, the higher the yield strength the 
higher will be the strength of the vessel. 

In view of the fact that we have slow crack growth data only in 
test specimens it is possible that we have been too pessimistic 
about the growth of small cracks in pressure vessels. Accord- 
ingly a less conservative estimate of Wain, K*emin, and G*emin is 
now made in which slow growth is not allowed for in the pressure 
vessel. Table 4 lists w,,;, for this case. A crack of length 2¢ is 
assumed. 

The win values in Table 4 provide that if Fe. 2 7, in the test 
a crack 2¢ long will be stable in the pressure vessel when o = @,. 
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Table4 win assuming no slow growth of cracks in the pressure vessel 
tmax for w = 3 inches 


0.185 in. 
0.180 in. 
0.166 in. 
0.146 in. 


Other requirements on w may be made in order to achieve de- 
sired accuracy in making the G.* calculations. This has been 
discussed elsewhere by Irwin [3] and will not be repeated here. 


Minimum and ¢.* Requirements 


It is more direct and general to specify the minimum K,* and 
G.* than to depend on achieving yield strength in the net section 
of a small test coupon. 

For through prior cracks of length 2ag = 2¢ in a chamber and 
where 2a after crack growth in raising the stress to ¢, is 1.6(2a0’) 


K* = EG*emia = 3.240,% 


from Equation (3) 
or 


K* emia = 3.170, Vt 


G*emin = 0.3350,%t 


for steel 


Table 5 Minimum required valves of K.* and G.* with allowance for slow 
growth 
steel 

G*cmia, Steel for t-0.080 
10850¢ 868 
12100 968 
134008 1070 
14750 1180 
16200 1295 
17700 1415 
193004 1545 


oy assumed Kaio 
180,000 psi 103 
190,000 108 
200,000 37 105 
210,000 69 x 105 
220,000 10° 
230,000 3: 10° 
240,000 76 10 


< 


<< 


x XX XK XX 
< 


< 


x 


If no slow growth of cracks is to be expected, the minimum 
K.* and G.* values can be relaxed also as shown in Table 6 


nia = EG* min 270 
from Equation (3). 


Table 6 Minimum K.* and G.* with no allowance for slow crack growth 
in the pressure vessel 


Assume the largest prior crack 2a)! = 2f in length 
oy assumed K* emia S*.min (steel) 
180,000 451Vt xX 10? 67908 
190, 000 476V t 7580t 
200 , 000 502V t 83808 
210,000 527V t 92308 
220, 000 t 10120¢ 
230,000 11080¢ 
240,000 602V t 120504 


The assumptions, upon which the minimum requirements listed 
in Tables 3 and 4, 5 and 6 are made, are somewhat arbitrary and 
subject to revision in the future. The estimates appear to be 
the best that we can make at this time. Investigations now 
underway at the University of Illinois, Swarthmore College, and 
Stanford Research Institute are expected to help in estimating 
slow crack growths. Work by H. L. Smith and others will con- 
tinue to help in refining the formulas for the plasticity correction 
to the driving foreeG. These two effects are large but necessary 
modifications of the initial Griffith-Irwin formulas. 
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Representative Values of s.* for Various High Strength Steels 


Figs. 7-12 show representative curves of G,* plotted as a 
function of tempering temperature for specified sheet thick- 
nesses. In particular, Fig. 7 in connection with Tables 5 and 6 
shows that at the crossover point for net section stress and yield 
strength the toughness is just barely sufficient to meet minimum 
requirements including an allowance for slow crack extension at 
the 200,000 psi yield strength level. However, if this steel is to 
be used at 240,000 psi working stress and yield strength it will 
fall far short of supporting a crack 2¢ in length. 

In Fig. 9 and Tables 5 and 6 one may note that the minimum 
toughness requirements for a 2¢ crack are easily met at the 200,000 
psi strength level in Ladish D6A steel. However, at the 240,- 
000 psi strength level this steel also fails to meet the minimum 
toughness requirement for a 2tcrack. The degree of decarburiza- 
tion is important but not shown; however, the effect of plate thick- 
ness for one of the steels, 6434, is shown in Fig. 12. The criterion 


0 
AISI 4340, t=O.10 IN | 


STRESS (PSI x 103) 


LONGITUDINAL | 
TRANSVERSE | 
600 800 1000 
TEMPERING TEMPERATURE (°F) 
Fig. 7 Toughness and net section stress achieved in the S. test of SAE 


4340 steel. The specimen was 3 in. wide by 12 in. long, centrally 
slotted. 


(IN-LBS/SQ IN) 
_£ 


700 
400 


AISI 4335, t= O10 IN 


STRESS (PSI x 109) 


o- LONGITUDINAL 
°-TRANSVERSE 


*(IN-LBS /SQ IN ) 


800 
400 500 600 700 800 
TEMPERING TEMPERATURE (°F ) 


Fig. 8 Toughness and net section stress achieved in the G. test of SAE 
4335 steel. The specimen was 3 in. wide by 12 in. long, centrally 
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of wmin is easily met by the 3-in-wide specimens used for 
the tests represented except for the thicker specimens of 6434 
steel. Accordingly the crossover of curves ¢,, with o, indicates 
satisfactory performance. We do not encourage adoption of this 
as the sole criterion because it fails to distinguish between steels 
which are more than adequate in toughness. 


Correlation of ¢ = £.*/(to,2) With Minimum Toughness 
Requirements 


As shown by Irwin [3] the value of 8 has a special significance 
with respect to fracture appearance. He has predicted that for 
8 = 2m full shear fractures should be obtained. This is born out 
by Fig. 6 showing data for a number of steels. 

We have arbitrarily chosen the allowable crack length, 2a’ 
= 2t in constructing tables of minimum requirements so that 


2ro,*t 
G* mia = from Equation (3) 


LADISH D6A, 
160: t =0 110 IN Pail 
a to” 
= 120 
~ 
1600 | 


8 


> 


(IN-LBS./SQ.IN.) 
@ 


400 600 800 1000 
TEMPERING TEMPERATURE (°F ) 
Fig.9 Toughness and net section stress achieved in the G. test of Ladish 


D6A steel. The specimen was 3 in. wide by 12 in. long, centrally 
slotted. 


2 20 < 

170 / 

PEERLESS 56, 

130:-t=009 IN 

” 90.2 
1600 — 

= 1200} 
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= 400} 


ol 
1100 1150 1200 
TEMPERING TEMPERATURE (°F ) 


Fig. 10 Toughness and net section stress achieved in the S. test of Peer- 
less 56 steel. The specimen was 3 in. wide by 12 in. long, centrally 
slotted. 
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By definition 


G.* observed in test = ee. 


Let - - observed = £ = toughness safety factor. Hence by 
definition 8 = 2m for T.S.F. 2 1. 

If we wish to specify a tolerable crack requirement as 2ao’ = 
n(2t) for o = a, in the pressure vessel, then the new require- 


ment is that 


9. 
G.* observed = men from Equation (3) 


or 8 =n_ is required. 
2r 
The mathematics is clear but the accuracy of Equation (3) as 
o—> 9, must be examined experimentally as is also necessary for 


Equation (4) as Gne: approaches or exceeds ¢,. 


'80- MBMC 
t= 0.05 IN 

60 Lt 
— 800 
wm 
4091 | 
| 
0 


550 650 750 850 
TEMPERING TEMPERATURE (°F) 


Fig 11 Toughness and net section stress achieved in the S. test of MBMC 
No. 1 steel. The specimen was 3 in. wide by 12 in. long, centrally 


slotted. 
2 AMS 6434 | 
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140} 
woLe i 
2000 
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RECIPROCAL OF THICKNESS (INS) 


Fig. 12 The effect of plate thickness on the toughness and net section 
stress achieved in the G. test of AMS 6434 steel. The specimen was 3 in. 
wide by 12 in. long. 
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Conclusions 


i It is concluded that for test purposes the minimum width 
requirement for specimens is being met for most materials by a 
specimen width of 3 in. 

2 Tables of minimum G,.* and K,* values are provided in 
which it is assumed that a through crack of length 2¢ must be 
anticipated. 

3 The calculation of minimum requirements requires further 
refinements which will come through continued research on: 
(a) The plasticity at the end of a crack; and (6) the slow growth 
during the loading or holding at constant load during service. 

4 Under certain restricted conditions specified in this paper 
the achievement of a net section stress in the centrally notched 
sheet specimen guarantees the achievement of a strength equal 
to or above yield strength in a pressure vessel containing a largest 
through crack of length 2¢ where ¢ is the wall thickness. 

5 General adoption of a criterion of the net section stress 
equaling yield strength is not encouraged because it fails to ade- 
quately assess the crack resistance of superior materials. 

6 If a through crack of length 2a = n2¢ must be tolerated and 
the yield strength must be achieved in a pressure vessel, a good 
quality factor is suggested as 


=P \ p 
where p is the density and n is any numerical value we choose to 
select on the basis of what can be done in manufacturing and in- 
spection. This Q.F. would reflect the advantages of low density, 
high yield strength, high toughness, and high modulus. 

7 It is clear that with the steels at hand we cannot tolerate a 
prior crack through the thickness and 2¢ in length at a 240,000-psi 
stress level. The allowable size of surface crack will be con- 
siderably less than ¢ in depth. For an embedded crack or its 
equivalent in terms of an unfavorably shaped and oriented in- 
clusion will have to be less than ¢ in diameter. Accumulation of 
data on this topic is in progress. Preliminary results indicate 
n = (0.2 will be tolerable. 


APPENDIX 


irwin Alternative Method for Estimating the Slow Growth 
of the Crack in the s. Test 


In general the prediction of the slow growth cannot be done 
on the basis of present knowledge; under the special conditions 
of G. testing with the 3-in-wide centrally notched sheet specimen 
a good estimate can be made. The basis is as follows as provided 
by Irwin [3]. 

Let R be the resistance or energy per unit area consumed in the 
slow growth which is balanced by the crack extension force G. 
This is rising as the load increases in the test and the crack slowly 
extends. 

We assume at first 

c’ 


R= 


=) G. Gre) (7) 

The increment of slow growth beyond the original starting 
notch is represented by a,. When a, reaches the value c’ it is 
assumed that the resistance equals the plane strain fracture 
toughness G,,. When a, reaches the value a,, the onset of rapid 
fracture occurs and R =G,.. The exponent n is a number less 
than unity adjusted with c’ to fit experimental data. The vaiue 
of G which must equal 2 until a, = a,, is 


(8) 


8 / MARCH 1961 


fey) 02 03 04 


Fig. 13 Chart to be used to determine q, where K.* = ¢ \/wa). This 
makes allowance for slow crack growth without ink staining and plas- 
ticity in the S. test using a centrally notched specimen of width w and 
thickness B. 


It is assumed that instability occurs at the maximum load point 
where 


(9) 


From 7, 8, and 9 we have at instability 
= 


2nqi 


2 q — 
hh 1 + qi? 


o 
q (10) 
oy 


where 


2re’ 


_ Gre [5 (2 
G. L1 +a? 
Some steps are omitted here since they have been published else- 
where [3]. 

Information from a large body of G, data with staining for 
direct indication of slow crack growth was used to study the terms 
of 10 and 11. It was found that these data were fitted within ex- 
perimental error taking n = '/,. A plot of C versus per cent shear 
then led to 

2m o \* 


where 


C = 4.7 (P — 0.43) a (13) 


In making G, tests in which 2a, the initial central crack length, 
is known but no ink staining is used it is necessary to record: 


1 Gross section stress o 

2 Original half length of the slot a» 

3 Running crack shear lip fraction P measured between B 
and 2B from the edge of a centrally notched specimen 

4 Specimen thickness B 

5 Specimen width w 
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6 Temperature 
7 A value of the tensile yield strength 


Equation (12) is then rewritten as 


= 2 are ta 
ng i+@ 


= 2y + (14) 
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A graph of y versus z is given in Fig. 13 for constant qg; values. 
From Equation (8) we have 


KG." 
= 
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Energy Versus Stress Theories for 


W. N. FINDLEY? 
P. N. MATHUR? 
E. SZCZEPANSKI® 
A. 0. TEMEL* 


Combined Stress—A Fatigue Experiment 
Using a Rotating Disk 


An experiment is described in which the strain energy at the critical location for fatigue 
failure is maintained constant while the stresses on a given plane of the material at the 
same location are caused to fluctuate. 


Apparatus developed to produce this condition 


consisted of a circular disk with a wide flanged rim which was loaded along a diameter 


by means of pivot-pad bearings. 


The disk was then rotated under a constant load to 


produce the desired fluctuation in stresses at the center of the disk while maintaining a 


constant strain energy at the center. 


The fact that fatigue cracks were developed in the 


region of constant strain energy was considered to indicate that a concept of a fluctuating 
strain energy as a basic theory of failure by fatigue under combined stresses is not 


tenable. 


A NUMBER of theories have been proposed as govern- 
ing fatigue failure under combined stresses. Some of these were 
reviewed in a previous paper [1],5 in which references to the origi- 
nal work may be found. These theories have been based on some 
of the following concepts of the nature of fatigue failure: (a) 
a limiting stress: the principal stress, the principal shear stress, 
or the octahedral shear stress; (b) a limiting strain: the principal 
strain or the principal shear strain; (c) a limiting strain energy: 
the total energy of deformation or the energy of distortion; (d) 
an empirical criterion: the Mises criterion, the ellipse quadrant, 
the ellipse atc, the complete Guest’s law. Of these the mathe- 
matical form of the octahedral shear stress, the energy of dis- 
tortion, and the Mises criterion are the same except for a coef- 
ficient and a radical sign. All of these theories are concerned 
with the engineering concepts of stress, strain, and strain energy 
of a homogeneous medium. The question is, which, if any, of 
these theories is correct. 

It was observed by Gough and Pollard [2] that, whereas all 
the theories (a), (b), and (c) except two predicted a fixed ratio 
of the fatigue strength in bending b to that in torsion ¢, actual 
test data showed a wide variation in the ratio of b/t. (The ratio 
b/tis a function of Poisson’s ratio for the principal strain theory 
and the total energy of deformation, but the observed variations 
of b/t are much greater than can be accounted for by this varia- 
ble.) Such observations led Gough and Pollard [2] to propose 
the empirical equations, the ellipse quadrant, and the ellipse 
arc. These were found to describe the fatigue behavior under 
combined bending and torsion of ductile metals and cast irons, 
respectively, better than the other relations listed. 

The ellipse quadrant and ellipse arc expressions contain the 
values of b and ¢t in such a way that the equations are always 
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satisfied for pure bending and pure torsion. A study [1] of the 
foregoing relations and available data led to the suggestion that 
anisotropy might be a prime cause of the lack of agreement 
between test data and such theories as the principal shear stress. 
When a suitable correction factor was introduced for anisotropy, 
it was found that the ellipse quadrant equation could be derived 
from any one of the following theories: the principal shear stress, 
the principal shear strain, the energy of distortion, the octahedral 
shear stress, or the total energy of distortion. It was subse- 
quently found [3] that the ellipse arc resulted from applying 
the same correction to the principal strain theory and that such 
a correction applied to the principal stress theory yielded a 
parabola which was in better agreement with data for cast 
irons than the ellipse arc. 

The few data available covering biaxial tests in both quad- 
rants, tension-tension, and tension-compression, show too much 
scatter to yield a reliable choice between the principal shear and 
distortion energy theories [3]. 

Recently the idea mentioned in a previous paper [1] that fatigue 
is initiated by repeated shear stress but that the presence of a 
normal stress on the critical shear stress plane influences the 
fatigue strength has been explored in mathematical form [4, 5]. 
The resulting predictions for combined bending and torsion have 
been found to yield the best over-all description of available test 
data. An extension of this concept to include the effect of mean 
stress [6] appears to be in agreement with many of the availa- 
ble observations. 

In a previous paper [1] several facts which were difficult to 
reconcile with the strain energy concept were described. These 
included the following: Microscopic studies of slip band and 
fatigue crack formation by several investigators (including more 
recent studies) have indicated that fatigue cracks develop in 
regions of heavy slip. In fact, it has been observed for copper 
[7] that initial fatigue cracks followed the direction of the slip 
system. This being the case, the orientation of stress relative 
to potential slip planes would seem to be an important factor in 
the initiation of fatigue cracks. Since energy is a scalar quantity, 
its properties are independent of orientation. Thus it seems rea- 
sonable to question whether such a scalar quantity could be the 
controlling factor. In the usual fatigue test both stress and strain 
energy fluctuate so that it is not possible to determine directly 
which is actually responsible for failure. 
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Proposed Experiment 


During a conversatidn several years ago, Franklin Fowler, 
Jr., suggested that it would be interesting if a means could be 
devised to perform a fatigue test in which the strain energy in the 
critical region remained constant while the principal stresses 
fluctuated. 

Such a test is the subject of this paper. The desired conditions 
would be achieved if a constant state of stress could be applied to 
the critical region of a specimen with constant intensity, while 
the specimen was rotated continuously with respect to the applied 
loads. Thus the strain energy at the critical region would re- 
main constant since the magnitudes of the stresses remain con- 
stant. But the orientation of the principal stresses would change 
with respect to the specimen. Thus on any particular plane of 
the specimen, the stresses would fluctuate. Hence fatigue 
failure should never occur in such a test if strain energy con- 
trolled fatigue. However, if stress theories controlled fatigue, 
stresses of sufficient intensity should cause fatigue fracture. 

The method of loading finally devised to meet the desired con- 
ditions employed a flanged disk loaded along a diameter with a 
constant force, as suggested in a previous paper [1]. If the loads 
were transmitted to the disk through rollers, for example, and 
the disk was rotated, the strain energy at the center of the disk 
would remain constant but the stresses on a given transverse 
plane at the center of the disk would fluctuate from compression 
to tension as the disk rotated. 

Thus this scheme provided all necessary conditions except that 
the center of a flat disk would not be the critical location. The 
stresses are much more severe in the region of contact with the 
loading rollers, so that fatigue failure would occur there. Several 
design studies were made to try to transfer the critical region to 
the center of the disk by dishing the surfaces of the disk to make 
it thin at the center and thick at the edge, by strengthening the 
rim through heat-treatment, carburizing, etc. However, none 
appeared promising. Finally a suggestion by H. Langhaar that 
the loads be applied through pivot-pad bearing surfaces lead to a 
solution. 


Specimen and Apparatus 


The specimen finally used employed a wide rim to receive the 
pressure from partial journal bearing surfaces made of bronze 
backed by a steel block and pivoted on a steel ball. The web, 
or disk, was made thinnest at the center and increasingly thicker 
toward the edge. Originally, the shape was chosen such that 
the thickness increased in proportion to the shearing stress along 
the diameter of loading as observed in photoelastic studies of a 
disk of constant thickness loaded along a diameter by con- 
centrated forces at the edges. However, this shape proved to 
be too difficult to machine, so it was approximated by making 
the cross section taper from 0.08-in. thick at the center to 0.10- 
in. thick at 1.15-in. radius. A fillet of '/,-in. radius was used to 
join the web to the rim. A diagram of the specimen employed 
is shown in Fig. 1. 

The apparatus used to test the specimen is shown in Fig. 2. 
The specimen was squeezed between two bronze bearing blocks, 
A, Fig. 2. The force was applicd to the back of the bearing 
blocks through steel balls, and the force was produced by tighten- 
ing the nuts on bolts, B. The bars, C, served as springs to apply 
the load and also to measure the load approximately. For this 
purpose, an SR-4 gage was fastened to one of the bars and cali- 
brated to indicate the force applied to the specimen. 

Lubrication was provided by filling the box shaped frame 
around the bearing with lubricant. The specimen was rotated 
by a two-horsepower motor driving the vertical spindle, D. 
The lower end of the spindle contained a toothed sprocket made 
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of an asbestos laminate. This sprocket engaged notches on the 
upper flange of the test specimen permitting it to be rotated by 
the spindle. The spindle was mounted so that it could be tilted 
away from the specimen to permit removal of the specimen to 
inspect for cracks. 


Material 


Because of anticipated difficulties in producing high stresses 
in the test specimen, it was decided to use a material having a 
fairly low fatigue strength. At the same time, it was desired 
to employ a material which would minimize yielding during the 
test and which would have relatively little hysteresis. The 
material selected was 355-T61 cast aluminum alloy. The samples 
were sections 4'/, in. in diameter by 2'/, in. long from a rapidly 
chilled casting. The casting procedure was such that very little 
segregation of elements would occur at the center. Fig. 3 
shows a photomicrograph of a transverse section at the center. 
A chemical analysis of these samples yielded the following com- 
position by per cent: Cu 1.34, Fe 0.21, Si 5.09, Mg 0.44, Ti 0.14. 

The material was heat-treated by heating 12 hours at 980 F, 
quenching in boiling water, and aging 9 hours at 310 F. Static 
tests® of transverse specimens from the test cylinders provided 
the following results: 


Tensile strength 
Tensile yield strength 
at 0.2 per cent offset 
Elongation in 2 in. 
Reduction of area 


47 ,600 psi 


30,500 psi 
8.0 per cent 
7.0 per cent 


® Performed at the Aluminum Research Laboratories, Aluminum 
Company of America, New Kensington, Penn. 


™ 


Fig. 1 Specimen for rotating disk experiment 


= 


Fig. 2 Apparatus for rotating disk experiment 


MARCH 1961 


| 
| 
a 
. 
: 
an 
‘Li ‘ 


Compressive yield strength 
at 0.2 per cent offset 
Shear strength’ 


31,000 psi 
30,900 psi 


These strength values are 10 to 15 per cent higher than pub- 
lished values for sand castings of this alloy. Thus the rotating- 
beam fatigue strength was probably also higher than the pub- 
lished value for sand castings of 9 + 2 X 10* psi at 5 X 10® cycles 
and 20 + 5 X 10° psi at 10° cycles. The higher strength was 
probably due to the chill-casting technique. 

The hardness was observed to be 55-57 Rockwell B. 


Procedure 


The design studies suggested that lubrication of the loading 
pads would be very critical if stresses above 6000 psi were re- 
quired. These difficulties were realized, so various lubricants 
were tried in an attempt to find something that would permit 
operation of the machine without excessive metal-to-metal con- 
tact of the rubbing surfaces. Different viscosity lubricating oils 
with and without additives such as Molycote were tried without 
success. Finally, silicone base oils having a flat viscosity- 
temperature curve were tried. Dow-Corning 200 fluid blended 
to a viscosity of 300 c.s. was found to be satisfactory. 

Heating of course occurred during the testing. Because of the 


* Determined from a double-shear test on a '/2-in. diameter speci- 
men using an Amsler shear tool 


Fig. 3 
aluminum alloy (courtesy Aluminum Company of 


Photomicrograph of transverse section from cenier of 355-161 cast 
America) 


Table 1 Results of fatigue tests of rotating disk specimens 


nature of the experiments, it did not seem worth the trouble to 
provide means for circulating or cooling the lubricant. Accurate 
control of stress or number of cycles, temperature, etc., was not 
necessary. The all-important observation was whether or not 
fatigue cracks would be produced. 

Thus the procedure employed was to insert the specimen, 
start it rotating at a speed of 3500 rpm (2100 rpm for the last 
test), and then apply the load rapidly. After the load had been 
applied for a given interval the load was removed and the machine 
stopped to allow it to cool and the specimen was removed and in- 
spected for cracks. This process was repeated at such loads and 
intervals as necessary. 


Results of Fatigue Tests 


The specimens did develop fatigue cracks on planes trans- 
verse to the disk as shown in Fig. 4. The first specimen failed at 
a load of 5000 lb after a varied history of lubricants and loads 
from 1800 to 5000 Ib. Five additional specimens were tested 
to failure using silicone lubricating oil under conditions detailed 
in Table 1. 

It was interesting to note the spiral shape of the cracks that 
developed in all specimens, but it was distressing to observe the 
rapidity of propagation of the fatigue cracks once they had 
formed. The interval between inspections was shortened to as 
little as 2000 cycles in an attempt to observe the crack at an 


Fig. 4 Photograph showing fatigue cracks in specimen No. 3 


Approximate principal Cycles Total 
stresses, psi between cycles to 
Test Load, lb “1 o2 inspections Cycles failure 
Under 1,800 20000 925000 
1 2900 4400 — 12100 10000 41000 
3,500 to 5,000 5000 118000 1,084,000 
2 2600 4000 — 10800 5000 13000 
3,250 to 4,000 42000 55,000 
3 4000 6100 — 16700 10000 36 , 500 
4 3550 5400 — 14800 4000 232 ,000 
5 3550 5400 — 14800 10000 72,000 
6 2609 4000 — 10800 2000 14500 
3550? 7200 — 19700 1500 3000 17,500 


* The lead of the fracture wire broke and had to be replaced before the test was continued at the 


higher load. 
Bakelite cement. 
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During replacement the test section of the specimen was remachined to remove the 
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early stage, but without success. The crack spread from no 
observable crack to one an inch or more in length in 1500 cycles 
or less. A spiral fracture wire was cemented to one specimen in an 
attempt to observe the specimen when a short crack had formed. 
However, an extensive crack had developed before the machine 
could be stopped. 

The fractured surfaces were considerably scored by rubbing 
during the latter stage of the test, so that positive determination 
of the origin of the crack was not possible. However, an examina- 
tion of the fracture patterns by the authors and also by some of 
the physical metallurgy staff of Hamilton Standard Division of 
United Aircraft led to the conclusion that the cracks originated 
near the center of the disk. The cracks were of the same nature 
on all specimens and consisted of a relatively straight portion 
about '/: in. long which passed within '/\. in. of the center of the 
specimen ('/, in. in one test), and spiral cracks originating at 
both ends of the straight crack. It was impossible to pinpoint 
the origin of cracking any closer than the straight central por- 
tion of the crack. However, this means that the origin of all 
cracks was certainly within about '/, in. from the center and may 
have been within '/, in. from the center for all specimens tested. 

The following facts, (a) the cracks were on planes normal to 
the disk, (b) the directions of scoring marks on fracture surfaces 
were parallel to the disk surface, and (c) the state of stress was a 
biaxial tension-compression, are all consistent with the proba- 
bility that fluctuating shearing stress on a critical plane was 
the chief cause of fatigue cracking. The active shearing stress 
was on a plane normal to the surface of the disk and in a direction 
parallel to the surface of the disk. 


Stress Analysis 


In order to determine the extent of the area of the disk over 
which the stresses and the strain energy were nearly constant in 
magnitude, stress analyses by means of photoelasticity and 
strain gages were employed. A qualitative photoelastic study 
was made by machining a duplicate of the specimen from cast 
epoxy resin. This disk was loaded between the .steel-bronze 
pads of the original testing rig and later against pads of epoxy 
resin backed by a phenolic molding material. In the latter 
instance the ratios of elastic moduli were near to those in the 
actual test but not exactly the same. When the model was 
loaded approximately to the proportion of the load on the test 
specimen indicated by the ratio of the moduli of the two materials 
a roughly rectangular area of uniform shearing stress about 1 by 
1'/, inches was observed. However, since the fringe order was 
only about 1'/; fringes, the accuracy of this observation was of a 
low order. The stress distribution was also observed to be af- 
fected by the stiffness of the loading pads and by the intensity 
of the load. 

An analysis by means of SR-4 strain gages was also made to 
achieve greater accuracy. Gages having '/, in. gage length 
were mounted on each side of the disk at distances of 0, 0.3, 
0.55, 0.8, and 1.05 inches from the center on one of the alumi- 
num alloy specimens, using different orientations so as to form 
45 deg rosettes. This specimen was loaded through the original 
bronze-steel pads and strain gage readings were taken for several 
orientations of the specimen. The data obtained permitted 
the determination of the magnitude and direction of the principal 
stresses at nine points along three diameters: parallel to the 
direction of loading, perpendicular to the direction of loading, 
and at 45 deg to the direction of loading. The magnitudes of 
the principal stresses are shown in Fig. 5 for these three direc- 
tions. 

An x-ray photograph of the specimen with strain gages at- 
tached showed that some of the strain gages were out of position. 
As the calculations of principal stresses were made before this 
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Stress / Load, |' yw? 


“12 40 -O8 -O6 -04 -02 0 02 04 O06 O8 10 12 


Distance from center, in 


(a) Parallel to direction of loading 
(b) 45 deg to direction of loading 
(c) Perpendicular to direction of loading 


Fig. 5 Stress distribution in the aluminum disk when loaded through 
bronze shoes with a diametral force of 3000 Ib 


was known, the values of some of the points plotted in Fig. 5 are 
inaccurate so were given less weight in drawing the curves shown. 

It will be observed from Fig. 5 that the principal stresses re- 
mained nearly constant in magnitude over a substantial region of 
the central portion of the disk. Since the strain energy functions 
involve the three principal stresses (one of which is zero in the 
present instance) and the elastic constants, it follows that both 
of the strain energy functions (shear strain energy or total strain 
energy) were also substantially constant over much of the central 
portion of the disk. 

Using values of principal stresses obtained from the curves of 
Fig. 5 the shear strain energy was calculated at a radius of 0.3 in. 
for directions parallel, at 45 deg, and perpendicular to the direc- 
tion of loading. It was found that the strain energy did not 
vary sinusoidally. However, the extremes of the cycle varied 
only about + 7 per cent from their mean value (at the radius 
of 0.3 in.). At a smaller radius the fluctuation was less. On the 
other hand, there was a complete reversal of the principal shearing 
stress, i.e., a variation of +100 per cent. 

From the information given by the stress analysis as plotted 
in Fig. 5, the principal stress corresponding to the loads imposed 
on the individual specimens was calculated and shown in Table 
1. The alternating principal shearing stresses were also cal- 
culated and are shown in Fig. 6 as a function of the number of 
cycles to failure for the three tests which were performed at con- 
stant load. Also shown in Fig. 6 are the scatter bands for rotat- 
ing beam fatigue tests of this class of alloy as reported by the 
Aluminum Company of America. It is evident that the data 
fall well within the scatter band. From Fig. 6 the fatigue 
strength based on principal shearing stress may be estimated 
to be about 10,000 psi at 10° cycles for the disk specimen. This 
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Fig. 6 S-N diagram for rotating disk fatigue tests 


is of the same order of magnitude as the value of 10 + 2.5 X 10° 
psi for the shear stress corresponding to published data on the 
fatigue strength at 10° cycles for cast alloys as determined from 
rotating beam tests. The states of stress for the two tests, 
however, are different. The former is biaxial tension-compres- 
sion and the latter is uniaxial. Also, the former is comparable 
to a compression mean stress in that there is a compression nor- 
mal stress acting on the principal shear plane at both extremes 
of the cycle; whereas the latter is a completely reversed stress 
cycle for which a tensile normal stress acts on the principal shear 
plane at one extreme of the cycle and a compressive normal 
stress at the other extreme. The theory presenved in a previous 
paper [6] suggests that the effect of these normal stresses would 
be to cause the fatigue strength to be greater in the disk specimen 
than the rotating beam specimen. 

It should be mentioned again that, while the agreement noted 
above between the fatigue strength (based on principal shearing 
stress) of the disk specimens and rotating beam specimens is 
interesting, it was not at all necessary to the interpretation of 
the present experiment. The significant observation was that 
fatigue cracks were produced in the disk specimens in spite of the 
fact that there was no fluctuation in the applied strain energy 
at the critical location during the tests of the disk specimens. 


In view of the fact that fatigue cracks were produced in regions 
in which the strain energy was essentially constant during 
the test (except for the few times the load was removed for in- 
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spection and reapplied), it was evident that fluctuating strain 
energy could not be the prime cause of fatigue failure. Con- 
versely, it was inferred that fatigue failure must result from 
fluctuation of some component of stress or strain referred to 
particular planes of the material, such as a critical shear stress. 

Thus while strain-energy forms of expressions may be useful 
as design formulas for combined stress, the afore-mentioned ex- 
periment seems to prove that they are not valid for describing 
the mechanism of fatigue. Hence their use for describing fatigue 
under states of combined stress which are outside the range 
which has been verified by experiment may yield incorrect re- 
sults. 
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Microplastic Strain Hysteresis Energy 
as a Criterion for Fatigue Fracture 


In this paper an energy criterion for fatigue failure is postulated. 
hysteresis energy is considered to be an index for fatigue damage. 
tion is developed between stress amplitude and the number of cycles to failure which 


Microplastic strain 
On this basis, a rela- 


utilizes only material properties obtained from the static true stress-strain tension test. 
The analysis is found to compare well with an experimentally determined S-N curve for 


SAE 4340 steel. 


Waes the problem of fatigue fracture was first 
studied nearly 100 years ago, investigators approached the prob- 
lem by considering both stress and deformation. It was soon 
learned that a plot of alternating stress versus the logarithm of 
the number of cycles to fracture was approximately a straight 
line. Stress alone furnished a convenient design criterion and 
was easy to control in laboratory tests. Fewer and fewer meas- 
urements of cyclic deformation were made, until about 1930 when 
nearly all effort in this direction ceased. 

Design is now, for the most part, based on a stress criterion. 
Allowable stresses are obtained by running S-N curves for the 
material to be used under the conditions it is expected to ex- 
perience in service. 

Studies in low-cycle fatigue where high stresses or strains 
result in fatigue lives of less than 10,000 cycles, have indicated that 
the cyclic plastic-strain range or the width of the hysteresis loop 
is a much more satisfactory quantity to correlate with fatigue life 
than is the stress. 

The importance of measuring strain as well as stress has worked 
its way back into the philosophy of fatigue research. 

If a cyclically loaded material exhibited a perfectly linear elastic 
relation between stress and strain, that is, if there is no deteriora- 
tion of the elastic energy, the material would be resistant to 
fatigue fracture. Obviously, to start with a specimen in one piece 
and then after the application of a finite number of load cycles 
find it to be in two pieces, requires a conversion of energy. The 
energy necessary to cause fracture is collected in small amounts 
during the course of the cyclic loading and is observable in terms 
of strain hysteresis. 

A few investigations concerning energy conversion and its re- 
lation to fatigue failure have been made. Hysteresis is intimately 
related to the field of material damping in which a number of 
studies have been conducted [1].!. Only those investigations 
directly related to the study of hysteresis energy and fatigue be- 
havior will be mentioned here. 

The first study of this type was reported by Inglis [2] who 
measured the total energy to fracture of fatigue specimens sub- 
jected to rotating bending stresses. A plot of his results is shown 
in Fig. 1 which illustrates the manner in which the total energy is 
collected. 

In 1947, Hanstock [3] recorded the total energy to fracture for 
an aluminum alloy in alternating torsion. He suggested the 
equation 


1 Numbers in brackets designate References at end of paper. 

Contributed by the Metals Engineering Division of Tae American 
Society or Mecuanicat EnGineers and presented at the Joint 
ASME-AWS Metals Engineering Conference, Los Angeles, Calif., 
April 25-29, 1960. Manuscript received at ASME Headquarters, 
October 14, 1959. Paper No. 60—Met-2. 
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AE = C+ DN 


where AE is the hysteresis energy per cycle, N is the number of 
cycles, and C and D are constants. The constant C was defined 
as the quantity of energy causing fracture while D was considered 
to be an energy quantity which did not contribute to the fracture 
process. Further experimental efforts [4] have been made in an 
attempt to verify this equation but have met with little success. 
Pardue, Melchor, and Good [5] and Duce [6] have also measured 
the total energy to fracture. 

Enomoto [7] has advanced the hypothesis that failure occurs 
under cyclic loading when the energy absorbed in each cycle in 
excess of a certain nondamaging amount accumulates to a critical 
total value. On the basis of this hypothesis, a theoretical o-N 
relationship was developed which qualitatively agrees with the 
tendencies of experimental o-N curves. The hypothesis em- 
bodies three assumptions: (a) The energy lost by internal friction, 
beyond a certain amount in one cycle of repeated stress, con- 
tributes to fatigue failure; (b) the total amount of internal lost 
energy per unit volume which has contributed to fatigue failure 
is constant; (c) in the vibration of metals there is the following 
relation between the logarithmic decrement 6 and the stress 
amplitude o: 6 = ko™. 

The analysis that accompanied these assumptions probably 
represents the first advance of a formalized hysteresis-energy 
criterion for fatigue fracture. No experimental results were re- 
ported in support of the theory. 

There are many different mechanisms responsible for energy 
conversion during cyclic loading which are sensitive to a number 
of variables. A study of these factors has been made in a previous 
paper [8]. 

To study fatigue failure using an energy criterion requires a 
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Fig. 1 Total work done to fracture for low carbon steel [2] 
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knowledge of the stress and plastic strain during a cyclic test. 
Very little has been done to measure the stress and plastic strain 
in the most important stress region from a fatigue viewpoint; that 
is, at and just above the fatigue limit. Measurements in this 
region are difficult to make due to the small inelastic strains 
which are involved. The present paper represents a preliminary 
attempt to measure directly the microinelastic strains which are 
present near the fatigue limit, with the intention of correlating 
the plastic-strain hysteresis energy with fatigue behavior. 

The scope of this investigation covers the testing of eight SAE 
4340 steel specimens in an axial fatigue machine under conditions 
of controlled stress. True stress-true strain information was ob- 
tained for the same material. 


Experimental Investigation 


The chemical analysis, heat-treatment, and engineering stress- 
strain properties of the SAE 4340 aircraft quality steel used in 
this investigation are given in Table 1. 


Table 1 Description of material 


Material: Aircraft Quality SAE 4340 Steel 


Composition: C - 0.40, Mn - 0.51, P - 0.016, 
S$ - 0.019, Si - 0.25, Ni - 1.73, 
Cr - 0.87, Mo - 0.24 

Heat Treatment: The fatigue, tension, and hardness 
specimens were austenitized at 1520°r 
in a neutral salt bath, quenched in 
still ofl at room temperature and 
tempered at 1200°F for one hour. after 
machining, the specimens were stress 
relief annealed one hour at 900°Fr. 


Tensile Properties: Average tensile properties for two 
specisens are given in the table below. 
Knoop hardness of test blocks along 
with the approximate Rockwell C equiv- 
alent is also given. 


ineer: Hard 
Strength, pei Strength, psi kg. load Rockwell Cc 
127,200 138,700 312 30.1 


Fig. 2 Static tension test specimen 


Static Tests. The true stress-strain tension curve for a material 
is a more absolute and genuine representation of the phenomenon 
of flow and fracture than is the usual engineering stress-strain 
curve [9]. For this reason true stress-strain data were obtained 
for the 4340 steel in a static tension test. 

Solid cylindrical specimens were machined to the geometrical 
configuration and dimensions shown in Fig. 2. The specimens 
were lightly polished after heat-treatment. Measurements of the 
reduced section were made to the nearest 0.0001 in. on a super- 
micrometer. 

An Instron tension testing machine with a crosshead speed of 
0.02 ipm was used to apply and measure the load. 

A modified version of a double ring diameter gage as described 
by Pian and D’Amato [10] was used to measure the decrease in 
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diameter of the specimen. Four AD-7 wire-resistance strain 
gages were attached, as shown in Fig. 3, and electrically connected 
as an external four-arm bridge. This arrangement gives 400 per 
cent increase in bridge output and provides temperature com- 
pensation. 

Sensitivity of this gage was 36,300 microin. per in. of strain 
reading for a l-in. change in opening of the contacts. Thus 
0.0001-in. change in opening of the contact gave a reading of 
almost 4 microin., which is approximately the precision to which 
the strain indicator could be read. 

A calibration curve was made before every test by means of a 
rod with step diameters from 0.900 to 0.1600 in. in 0.100-in. 
steps. 

A microhardness tester with a 1.0-kg load and a Knoop in- 
denter were used to determine hardness measurements on small 
blocks of material which were heat-treated with the specimens. 
Hardness values are given in Table 1. 

Specimens were fixed in the grips of the Instron machine with 
the diameter gage in place. The load was then applied and 
diameter readings were taken at approximately 25 intervals be- 
fore fracture of the specimen. The final diameter measurement 
was made when the specimen was removed from the machine. 


rt 140 tpi 
Fig. 4 Fatigue test specimen 


Cyclic Tests. Hysteresis loops were monitored during fatigue 
testing by means of simultaneous stress and strain measurements. 
All tests were performed at a fixed amplitude of alternating stress 
about a zero mean stress. 

The geometry of the specimen and its dimensions are shown 
in Fig. 4. The specimens were drilled through with a 5/;¢-in. 
gun drill and then rough-machined. After being heat-treated 
they were honed on the inside and finish polished on the outside. 
In order to minimize residual stresses due to machining, the speci- 
mens were stress-relief-annealed in a vacuum at 900 F for 1 hr. 

The testing machine used was a modification of the one re- 
ported by Findley [11]. A schematic diagram of it is shown in 
Fig. 5. The machine is a constant-amplitude-of-deflection type 
and has a capacity of approximately 3000 Ib. 

With the present technique and methods available, it is difficult 
to make sensitive stress-strain measurements when a cyclic load 
is applied at high frequencies. Since the normal operating fre- 
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Fig. 5 Schematic representation of apparatus 


quency of the machine was 1200 cpm, it was necessary to install 
an auxiliary drive system for low-frequency cycling. The fre- 
quency of operation with this system was '/: cpm and was slow 
enough to permit stress-strain measurements with a static strain 
indicator. It is known that frequency has an effect on the energy 
converted per cycle, however, for the range of frequencies used, 
this effect is small [8]. 

Load was measured by use of a ring dynamometer employing 
four wire-resistance strain gages. The gages were arranged in a 
four-arm external bridge giving a 400 per cent increase in bridge 
output and temperature compensation. 


QV 


Fig. 6 Detail of strain gage arrangement on straight? section of test 
specimen 


Several methods of strain measurement were considered in the 
light of sensitivity and simplicity. Ultimately, four ABD-7 
bakelite gages were attached to the specimen in the form of a 
four-arm external bridge as shown in Fig. 6. This arrangement 
does not lend to simplicity since it requires a large supply of time, 
patience, and bakelite strain gages. However, it does give good 
sensitivity. The four-arm external bridge with the two Poisson- 
effect gages gives a 260 per cent increase in bridge output and 
provides temperature compensation. 

Due to the smooth surface finish on the specimens, nitrocellu- 
lose bonded gages became unbended. Bakelite gages were then 
used in an effort to achieve a better bond and more stability. 
Even with these, gage slip was encountered at the higher stresses 
and false negative permanent strains were observed. 

An extensometer with a lower sensitivity was used to check 
continuously the stability of the strain gages on the specimen. 
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It consisted of two heat-treated steel loops with wire-resistance 
strain gages mounted in the same manner as on the load cell. 
The loops were founded on the shoulders of the specimen to avoid 
fatigue failure at the gage-point marks. The extensometer was 
calibrated for each test by comparing the readings from the 
gages on the specimen with those of the extensometer. In case 
the strain gages failed, the extensometer was used for strain read- 
ings during the remainder of the test. 

Additional details on the test procedure will be found in the 
Appendix and reference [12]. 


Experimental Results and Discussion of Trends 


Results of Tests. The results of the static tension tests are given 
in Table 2 and Fig. 7 in the form of true-stress-strain properties 
and the true-stress-strain curve. Fig. 7 was plotted from the data 
obtained from two specimens. 

The fatigue data are listed in Table 2. Fig. 8 shows the results 
of the cyclic stress-strain measurements. Not only the maximum 
stress values, but all values of stress at which inelastic strains 
were measured are plotted in Fig. 8. These data were obtained 
early in the life (N < 3 per cent of the life to fracture) of six speci- 


mens. Values for tension and compression have been plotted 
together. 
2s 
200 
3 
10 + 
L 
% 3 4 x 7 
True Strain, , tinsin) 
Fig. 7 True stress-strain curve 
Table 2 Summary of test results 
STATIC 
Spec. No psi in/in. Uv, 1b-in/in? n 
$-1 218,500 0.840 151,300 0.094 
$-2 228,200 0.895 164 ,500 0.103 
cycLIc 
Spec. No. 4 o., psi Ne Total Energy to 
Practure, Wes 
1b-in/in? 
9 45,000 543,000!) 287,000'?) 
4 56 ,000 1,800,000!2) 1,850,000!) 
3 60,000 153,200 232,000 
1 65,0002) 257,800 576 ,000 
12 72,000 67,100 no data 
2 75,000 34,200 158,000 
10 85,000 1,850 329,000 
8 90,000 1,700 no data 


(1) Did not fail - considered runout 
(2) Tested for a few cycles at *35,000 psi 
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Fig. 9 Logarithmic stress-plastic strain curve 


Discussion of Trends. Siutic Tests. Below the ultimate load, the 
true plastic strain is related to the true stress [13] by ¢, = ka! 
A plot of this equation on logarithmic co-ordinates is a straight 
line. The tension test data have been plotted in this manner and 
are shown in Fig. 9. The slope of this line, n, is known as the 
strain-hardening exponent. The values of n obtained from the 
two specimens are listed in Table 2. The data for both specimens 
were combined and the slope computed by the method of least 
squares. The least-squares value of the slope was n = 0.0987 
which agrees closely with the average of the values taken from 
the two curves which is n = 0.0985. 

Cyclic Tests. The data in Fig. 8 contain a large amount of 
scatter for two reasons which are: (a) They represent tension and 
compression data from six different specimens, and (6) it is dif- 
ficult to measure accurately the small inelastic strains. 

The line drawn through the data in Fig. 8 has a slope of one 
half and the data are evenly scattered around the line in the re- 
gion of lower stresses. As the stress increases the data tend to fall 
to the right of this line. 

Fig. 10 shows a sample of the cyclic stress-strain behavior for a 
stress amplitude of 85 ksi. At a stress of this magnitude the 
hysteresis loop grows rapidly. Fig. 11 gives the variation of hys- 
teresis energy with the number of cycles and stress amplitude. 
The areas of the hysteresis loops, AW, for this plot were ob- 
tained by numerically integrating the cyclic stress-inelastic 
strain data. The straight line on the semilog plot shows the 
energy per cycle to be governed by a growth-decay process such 
that AW = Cee’, where C is the AW value for N = 0, e is the 
base of natural logarithm, and c is the slope of the lines in Fig. 11. 

For specimens tested below +70 ksi, the change in the hys- 
teresis loop was found to be small. The specimens tested below 
the fatigue limit showed either a constant value or a slight de- 
crease of AW. 


Interpretation of Static and Cyclic Stress-Strain Results. Fig. 12 
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Fig. 10 Sample of stress-strain behavior for large stress amplitude (spec. 
No. 10) 
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Fig. 12 Relation between stress amplitude and total energy to fracture 


shows the relation between stress amplitude and total energy to 
fracture. The point (o,, 1’) from the static tension test has been 
plotted on this curve since it is a measure of the minimum amount 
of energy that is required to fracture a specimen. The total 
hysteresis energy at N cycles, Wy, is the sum of the plastic hys- 
teresis energy W, and the anelastic hysteresis energy W, or 
in equation form, Wy = W, + W,. Ifit is assumed that a con- 
stant total amount of plastic hysteresis energy is required to 
cause fatigue failure and that the anelastic energy is also accu- 
mulated with each cycle, then a curve of the type shown in Fig. 12 
would result. The scatter in the values of total work to failure 
W, is of the same order of magnitude as the scatter in the fatigue 
life, since the values of W, were obtained by summing AW from 
N =0toN =N,. Thecurve of Fig. 12 is generated either by a 
large amount of energy per cycle (predominantly plastic energy), 
with failure occurring in a few cycles; or by a small amount 
of energy per cycle (predominantly anelastic energy), with failure 
occurring in a large number of cycles. 
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Fig. 13 Internal friction curve for annealed copper showing effect of 
stress amplitude [14] 
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Fig. 14 Variation of inelastic strain with stress 


A clearer picture of the manner in which hysteresis energy is 
converted in a material can be obtained from data reported by 
Mason [14] shown in Fig. 13. 

In the low-stress region, Fig. 13, the hysteresis energy is due to 
anelastic dissipating mechanisms (nondamaging, e.g., magneto- 
elastic coupling, atomic diffusion, and so on). In the high-stress 
region the hysteresis energy is due to plastic deformation (damag- 
ing in a fatigue sense). Between these two regions a transition 
zone exists in which the energy per cycle is composed of both 
anelastic and plastic hysteresis energy. This transition zone oc- 
curs in the region of the fatigue limit. The presence of the 
anelastic hysteresis energy in this region is the reason why very 
large amounts of total energy may be measured. The assumption 
that fatigue failure represents the attainment of some limiting 
amount of tota! ‘astic hysteresis energy W, and that this total 
plastic energy quantity is constant does not mean that the total 
energy W, is constant. 

In Fig. 14, the stress-inelastic strain data of this investigation 
shown in Figs. 8 and 9 are replotted on the same logarithmic 
diagram. This type of plot exhibits two distinct regions and a 
transition zone similar to Fig. 13. In the region of low stresses 
the inelastic strain is largely composed of anelastic strain, since 
plastic strain decreases rapidly as the stress decreases. The rela- 
tionship between stress and anelastic strain for a ferromagnetic 
material may be expressed as [8], ¢ = k,e,'/*, where ¢, is the 
anelastic strain.? This relationship results from the fact that 

2 The z-subscripts have been chosen from the first letter of C. 


Zener’s name who did pioneering work in this field and coined the 
word “‘anelasticity.” 
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internal friction for a relatively strong ferromagnetic material is 
linearly related to stress [15]. The line labeled ‘anelastic’”’ in 
Fig. 14 has a slope of one half. 

In the high-stress region the ratio of the plastic strain to the 
anelastic strain is very large. Therefore it is a reasonable as- 
sumption to consider the inelastic strain in this region as entirely 
plastic. 

Summing up, inelastic strain is composed of anelastic strain 
and plastic strain. For high stresses the plastic strain dominates. 
For stresses below the fatigue limit the anelastic strain domi- 
nates. For stresses near the fatigue limit the two may be com- 
bined to form the curved portion (transition zone) of the solid 
line shown in Fig. 14. 

By definition, fatigue failure does not occur below the fatigue 
limit where the inelastic strains are predominantly anelastic. 
Thus anelastic strain is nondamaging from a fatigue viewpoint. 
The significant inelastic-strain hysteresis energy which causes 
fatigue failure must then be the plastic strain hysteresis energy. 


Analysis 


Using the model of a stress-placiic strain hysteresis loop in 
Fig. 15, it is possible to generate an S-N curve using only the 
static true stress-strain properties of the material. To do this, 
plastic hysteresis energy is used as the criterion for fatigue frac- 
ture. 


oa 


Fig. 15 Model of c — ¢, hysteresis loop 


Theoretical S-N Relationship.*? For a reversed stress amplitude of 
o, the area under the a — ¢€, hysteresis loop for a single cycle, AW, 
may be written as, 


AW = ode, a) 


where ¢@ is some general stress level and €, is the plastic strain 
at that stress level. 

tn Fig. 11 it was shown that the energy per cycle changes with 
number of cycles for large stresses. However, for stresses in the 
region of ordinary fatigue (lives greater than about 5 x 10* 
cycles) there is only a small change in the energy per cycle. Thus 
it can be assumed with a small error that the energy per cycle is 
dependent only on stress and is independent of duration of test. 

If the experimentally determined equation for AW is used to 


3 The idea of this analysis was first advanced by JoDean Morrow in 
“Speculative Remarks and Analysis Concerning Mechanical Hyster- 
esis Energy as a Criterion for Fatigue Failure,” a private communica- 
tion to about fifty researchers in the field of flow and fracture and the 
subject of a talk presented to members of the technical staff of the 
General Electric Company, Evandale, Ohio, March, 1959. 
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account for the growth and decay of the hysteresis loop, Equation 
(1) would be as follows: 


AW = Ce* = [2 agg ode, | een (1a) 


where C is the area of the hysteresis loop for N = 0 and is equal 
to the area within the static true stress-true plastic strain hystere- 
sis loop, Fig. 15. Since the hysteresis-loop area is assumed to be 
a constant (constant energy per cycle) then c = 0 and e*” = 1. 
Application of this assumption to equation (1a) results in equation 
(1). The constant-energy-per-cycle assumption is not valid in 
the low-cycle fatigue region but is approximately true for lives 
greater than 5 X 10‘ cycles. Equation (1a) is not used in the re- 
mainder of the analysis since it was desired to keep the analysis 
free from experimental parameters determined from cyclic tests. 

The total energy converted by a specimen via damaging plastic- 
strain hysteresis in N cycles, W,, can be written as 


Aep 
W, = 2N f, ode, (2) 


The quantity W, is assumed to gradually increase, damage 
occurring on each cycle, until the capacity of the material to con- 
vert damaging energy is exceeded, at which time the body fails. 
It will be assumed as a first approximation that the total damaging 
energy to fracture in a fatigue test is constant and identical with 
the total energy to fracture in the static tension test U, Fig. 12, 
the area under the static true stress-strain curve. 

The only portion of the inelastic strain which is considered to 
cause fatigue damage is the plastic strain, €,, shown in Fig. 14. 
That due to the anelastic strain, also shown in Fig. 14, is con- 
sidered to be nondamaging. It is probably impossible to measure 
€, at the stresses of interest because the anelastic strain ¢€, is 
the same order of magnitude as €, making it necessary to separate 
the two. For want of measurements at stresses in the region of 
fatigue failure, it is assumed that the €, curve, Fig 9, taken in 
the high-stress region, where it is easily measured and is the 
dominating inelastic strain, can be extrapolated back to the low 
stresses of interest. 


A logarithmic plot of ¢ and ¢, is linear as can be seen from Figs. 
9 and 14. Thus 


= (o,)'" 
where ¢, and €, are any convenient corresponding values of true 
stress and true plastic strain taken in the region where plastic 


strain dominates and n is the slope of the curve in Fig. 9. 


Differentiating equation (3) with respect to o and substituting 
into Equation (2) gives 


2kN 
(4) 
When = 0, = 0 and when e, = Ae,,o = Changing 
limits and integrating between 0 and a, gives 
2kN 
> (1+n) 


(o,)' +nin (5) 


At fracture N = N,and W, = U. Substituting in equation (5) 
and rearranging terms leaves 


+n) 


(6) 


Taking the n/1 + n root of both sides, and writing the equation 
in log form 
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Using the symbol K for the first term of equation (7) which 
is constant for a specified material and set of test conditions, a 
simple equation for an S-N curve can be written, 


n 
log 0, = K tog, (8) 


This relationship which is based on a limiting constant plastic 
strain hysteresis energy to fracture, plots as a straight line on a 
log o, versus log N, curve, Fig. 16. The slope of the line is 
—n/1 +n, and the intercept at one cycle (log N, = 0) is K. 

Qualitative Comparison With Known Fatigue Behavior. One method 
of plotting usual fatigue data for lives between 10‘ and 10’ cycles 
is a log o,-log N, curve. Such a plot tends to make the data fall 
on a straight line as shown in Fig. 16, 


Ny 
Fig. 16 SN curve obtained using energy as a criterion for fatigue 


At least in ferrous materials, the S-N diagram exhibits a fatigue 
limit or a deviation from the straight line for low stresses as shown 
by the dashed line to the right in Fig. 16. The analysis presented 
in the preceding section does not predict such a fatigue limit. 

For high stresses the logarithmic S-N diagram also deviates 
from linearity as shown by the dashed line at the top of Fig. 16. 
The present analysis does not predict such a deviation. Since 
it is known that the hysteresis loop increases in area during the 
fatigue test at high stress amplitudes, Fig. 11, then incorporating 
this fact in the analysis, equation (la), would produce a fatigue- 
life curve at high stresses similar to the dashed line in Fig. 16. 

Comparison of Experimental Results and Analysis. As pointed out 
in the preceding section, equation (8) does not fit in the high- 
stress region or in the low-stress region of fatigue (below the 
fatigue limit). In the intermediate range, which is where most 
experimental S-N curves are established, equation (8) is reasona- 
bly valid. 

Equation (8) was applied to the fatigue-life data of this investi- 
gation and that of a previous investigation [16] on the same 
material. Only static true stress-strain data are needed to apply 
equation (8). The static tension data for the two specimens re- 
ported in Table 2 are averaged for this purpose and these average 
values are: U = 158,000 lb-in/in.? and n = 0.0985. For 
the values of o, and €, a convenient point in Fig. 9 was chosen. 
This point was o, = 151,000 psi and e, = 0.08in/in. Using these 
values the dotted line in Fig. 17 was constructed. 

The agreement between the S-N curve predicted from the 
static stress-strain curve using plastic-strain hysteresis energy as 
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Fig. 17 Comparison of test results with analysis 


a criterion and the experimental points is good except for the two 
portions of the curve which have been mentioned previously. 


Summary and Conclusions 


Hysteresis energy has been postulated as a criterion for fatigue 
failure. Specifically, the plastic-strain energy portion of the 
hysteresis energy has been assumed to account for the damaging 
effects. Cyclic and static stress-strain studies have been con- 
ducted on SAE 4340 steel. An analytical relation between stress 
amplitude and fatigue life has been derived on the basis of four 
assumptions: (i) Plastic-strain hysteresis energy is a measure of 
fatigue damage; (ii) a logarithmic plot of static true stress versus 
true plastic strain is valid when extrapolated back into the fatigue 
stress region; (iii) the damaging energy per cycle for a given stress 
amplitude is constant and is equal to the area under the static 
stress-plastic strain curve depicted in Fig. 15; and (iv) the total 
damaging energy required to cause fatigue fracture is constant 
and as a first approximation is equal to the area under the static 
true stress-true strain curve in Fig. 7. 
with the experimental results. 

For the material used in this particular investigation it may be 
concluded that: (i) Plastic-strain hysteresis energy provides a 
basis for the prediction of the S-N curve obtained; (ii) the static 
true stress-strain curve provides the necessary information to pre- 
dict fatigue life for stress amplitudes near and slightly above the 
fatigue limit; 


The analysis is compared 


(iii) cyclic stress-strain measurements serve to 
elucidate two facts; (a) the rapid growth of the hysteresis loop in 
the high fatigue-stress region accounts for the erroneous predic- 
tion of longer lives than actually measured, and (b) the slight 
decay of hysteresis energy for stresses at or below the fatigue 
limit demonstrates that damaging plastic-strain hysteresis energy 
dies out due to cyclic strain-hardening before the critical total 
plastic energy to cause fracture is accumulated. 

The incorporation of the observations in parts (a) and (b) of 
the foregoing paragraph into the analysis would tend to correct 
for the erroneously predicted long lives at high cyclic stres:es 
and would introduce a fatigue limit. At present, observation (a) 
can be incorporated by using the hysteresis-loop growth-decay 
equation (la). However, it was not included because of the 
empirical nature of this relation. 

Future investigations should be extended to a variety of ma- 
terials to investigate further the validity of microplastic-strain 
hysteresis energy as a criterion for fatigue fracture. 

Static true stress-strain measurements may never replace ex- 
perimental S-N curves, but at least this small amount of success 
offers some hope of reducing the number of cyclic tests required. 


Journal of Basic Engineering 


Acknowledgment 


This investigation was conducted in the Department of Theo- 
retical and Applied Mechanics, as a part of the work of the Engi- 
neering Experiment Station of the University of Illinois. The 
subvention of Mr. C. E. Feltner through a University of Illinois 
graduation fellowship and the support of the Evendale Plant 
Laboratory of the General Electric Company under Contract 
Code Number 46 22 60 334 made this work possible. 

Prof. G. M. Sinclair contributed many helpful suggestions. 
Appreciation is due Mr. G. R. Halford who carried out the 
majority of the experimental tests. Mr. R. L. Moline helped in 
the reduction of data and preparation of figures. The manuscript 
was typed by Mrs. Nancy Dahl. 


References 


1 L. J. Demer, “Bibliography of the Material Damping Field, 
WADC Technical Report 56-180, June, 1956. 

2 N. P. Inglis, “Hysteresis and Fatigue of Wéhler Rotating 
Cantilever Specimen,”’ The Metallurgist, February, 1927, pp. 23-27. 

3 R. F. Hanstock, ‘Damping Capacity, Strain Hardening and 
Fatigue," Proceedings, Physical Society, vol. 59, 1947, pp. 275-287. 

4 P.G. Forrest and H. J. Tapsell, “Some Experiments on the 
Alternating Stress Fatigue of a Mild Steel and an Aluminum Alloy 
at Elevated Temperatures,” Proceedings, 1.Mech.E., vol. 168, 1954, 
p. 763. 

5 T.E. Pardue, J. L. Melchor, and W. B. Good, “Energy Losses 
and Fracture of Some Metals Resulting From a Small Number of 
Cycles of Strain,” Society Experimental Stress Analysis, vol. VII, 
No. 11, 1949, pp. 27-39. 

6 A.G. Duce, “A Study of Some Fatigue Phenomena in Pure 
Metals and Alloys,”” PhD Thesis, University of Cambridge, England, 
1950, 225 pp. 

7 Nobusuke Enomoto, “On Fatigue Tests Under Progressive 
Stress,"’ Proceedings, ASTM, vol. 55, 1955, p. 903. 

8 C.E. Feltner, “Strain Hysteresis, Energy, and Fatigue Frac- 
ture,"” TAM Report No. 146, Department of Theoretical and Applied 
Mechanics, University of Illinois, June, 1959. 

9 CC. W. MacGregor, True Stress-Strain Tension Test—Its 
Role in Modern Materials Testing,”’ Journal of Franklin Institute, 
vol. 238, n. 2 and 3, August and September, 1944, pp. 111-135 and 
159-176. 

10 T. H. H. Pian and R. D’Amato, ““Low-Cycle Fatigue of 
Notched and Unnotched Specimens of 2024 Aluminum Alloy Uader 
Axial Loading,’”” WADC Technical Note 58-27, February, 1958. 

11 W.N. Findley, ‘New Apparatus for Axial-Load Fatigue Test- 
ing,”” ASTM Bulletin No. 147, 1947, pp. 54-56. 

12 C.E. Feltner and J. Morrow, “Micro-Plastic Strain Hysteresis 
Energy as a Criterion for Fatigue Fracture,"”, TAM Report No. 576, 
Department of Theoretical and Applied Mechanics, University of 
Illinois, Urbana, Illinois, May, 1959. 

13 J. R. Low and F. Garofalo, “Precision Determination of 
Stress-Strain Curves in the Plastic Range,” Society Experimental 
Stress Analysis, vol. IV, no. II, 1947. 

14 W. P. Mason, “Internal Friction, Plastic Strain, and Fatigue 
in Metals and Semiconductors,” American Society for Testing Ma- 
terials, STP No. 237, 1958, p. 36. 

15 N. L. Person and B. J. Lazan, “The Effect of Static Mean 
Stress on the Damping Properties of Materials,” Proceedings, ASTM, 
vol. 56, 1956, p. 1399. 

16 J. Morrow and G. M. Sinclair, “Cylic Dependent Stress Re- 
laxation,”’ American Society Testing Materials, STP No. 237, Sym- 
posium on Basic Mechanisms of Fatigue, 1958. 


APPENDIX 
Test Procedure 


Bakelite strain gages were mounted on the specimen test sec- 
tion and were then subjected to a drying cycle, the maximum 
temperature reached being 250 F. 

The specimen was placed in the machine free of initial stress 
with the aid of the distortion detector described in Reference [11]. 
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The electrical circuit for the specimen strain gages was completed 
and the extensometer mounted in place. 

With no load on the specimen, zero readings for the extensom- 
eter, load cell, and specimen strain gages were recorded. The 
first few cycles were applied by hand while the eccentric was ad- 
justed to insure completely reversed stressing. The low-fre- 
quency drive system could then be turned on and the machine 
allowed to apply cyclic loading slowly. The 1200-cpm drive sys- 
tem was operated intermittently until the specimen fractured, or 
the test was stopped. Measurements of stress-strain loops were 
made periodically throughout the test. 
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The procedure varied somewhat from one test to another. 
However, the method of taking readings was the same. The 
technique consisted of taking strain readings for the same loading 
and unloading values of stress. The method is analogous to the 
case in which a specimen is dead-weight loaded. As each weight 
is added a strain reading is recorded until the maximum desired 
stress is reached. Strain readings are again recorded as each 
weight is removed, thus giving strain readings at the same stress 
level for the loading and unloading portions of the hysteresis loop. 
This is the principle by which the mechanical hysteresis loops 
were measured. 
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Crack Propagation in Thin Metal Sheet 
Under Repeated Loading 


An experimental and analytical investigation was undertaken to study the funda- 
mental factors of crack propagation in a thin metal sheet under repeated axial loading. 

Sheet specimens of 2024-T3 aluminum alloy containing a central hole were used in 
the experimental investigation. Stress range and mean stress were the controlled 
parameters. 

An expression for crack length was derived, based on the concept of geometric similar- 
ity of crack configuration, for a semi-infinite sheet subjected to repeated loads consisting 


of a constant stress and mean stress. 


The expression is in terms of a stress dependent 


propagation factor and an exponential function of the number of cycles of loading. 
The expression for a semi-infinite sheet was modified for crack propagation in a 


specimen of finite width. 


Accurate predictions of the propagation life was possible 


using the modified equation. 
Photomicrographic observations of the crack tips were made. 


Introduction 


ATIGUE fracture is caused by initiation and cyclic 
propagation of one or more cracks. Because the presence of a 
crack changes the geometrical configuration of a member, the 
effects of stress range, mean stress, state of stress, notches, surface 
finish, specimen size, and environment appear to be different 
during crack initiation as compared with crack propagation. 
Therefore, separate investigations of crack initiation and propa- 
gation will contribute more to the understanding of the funda- 
mentals of fatigue fracture. 

Many of the earlier investigations studied the change of crack 
propagation rate at different stages [1, 2]' and tried to relate 
only qualitatively propagation rate or propagation life to degree 
of surface cold working [1], composition of alloy content [3], 
and notch severity [4, 5, 6]. 

In the last few years, various fatigue fracture mechanisms or 
fatigue fracture criteria have been proposed, such as Orowan’s 
mechanism of fracture [7, 8], Bowie’s rate of strain energy re- 
lease [9], Irwin’s crack driving force [9, 10], and the theoretical 


1 Numbers in brackets designate References at end of paper. 

Contributed by the Metals Engineering Division of Tae AMERICAN 
Society or MecHanicaL ENGINEERS and presented at the ASME- 
AWS Metals Engineering Conference, Los Angeles, Calif., April 25- 
29, 1960. Manuscript received at ASME Headquarters, February17, 


stress at the tip of the crack [7, 9, 5, 11]. The success of these 
proposed mechanisms or criteria depends on the agreement be- 
tween the proposed and the actual mechanisms or criteria as well 
as on the ability to calculate the actual stress and strain near the 
tip of the crack. 

This investigation was undertaken to study experimentally and 
analytically the macroscopic characteristics of fatigue crack 
propagation. 

Thin sheet specimens loaded in repeated tension were used be- 
cause of the simplicity of analysis and experimental observations. 
Stress range and mean stress were controlled parameters. Crack 
lengths were recorded by time lapse photography at regular cycle 
intervals and measured from the film. Metallographic observa- 
tions of the crack tip and slip band region were made to study the 
size of the plastic zone and the crack branching phenomenon. 

Crack propagation in the semi-infinite sheet under constant 
stress range and mean stress was analyzed using dimensional 
analysis. An expression for crack length in terms of number of 
cycles of load and a stress dependent propagation factor was ob- 
tained. The equation was modified for specimens of finite width. 


Analysis of Crack Propagation in Thin Sheet Under Repeated 
Loading 


A thin sheet of material of thickness ¢ and width L, containing 


1960. Paper No. 60—Met-11. a crack of length | as shown in Fig. 1, is loaded axially by a com- 
Nomenclature 
b = side of a square material l = crack length D 
element = initial crack length 
C, = constant L = width of specimen ‘= ra 
C = crack propagation factor N = number of cycles of load : 
Cave = average crack propagation No = number of cycles of load py = D 
factor corresponding to ly ois lo 
Cy = crack propagation factor in Nv = experimental fatigue life L 
a semi-infinite sheet Na = — fatigue life by oz, y) = stress at point P(z, y) in 
D = 0.000123A0 the direction of 8 
= exponential integral of ar- (14) Ao = stress range 


gument z 
f, fi = mathematical functions 
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t = thickness of the specimen 
€;; = strain tensor 
H = coefficient in Eq. (3) 6 = angle defined in Fig. 1 


= mean stress 
= nominal stress 
o;; = stress tensor 
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bination of a constant amplitude repeated stress and a constant 
mean stress. A total nominal stress o) is shown acting on the 


specimen in Fig. 1. 


% y 


% 


Fig. 1 © of tended chest 
specimen with a central crack 


The rate of cyclic propagation, di/dN, of the crack will be de- 
termined in terms of the geometry of the crack using the concept 
of dimensional analysis. It is assumed that the mechanism of 
crack propagation is the same in every region of the material and 
that the conditions of crack propagation are adequately specified 
by the stress-strain relations, the state of stress, and the stress or 
strain-cycle history at each point in the material. 

Neglecting the microscopic variables, the stress oo(z, y) at 
point P(z, y) in the direction of 8 can be written as 


oXz, y) = f(o, z, y, 8, t, L, configuration of the crack, 
complete stress-strain relation of the material) (1) 


If plane stress is assumed and the width L is large relative to the 
crack length 1, the variables ¢ and L can be excluded. If the 
shape of the crack for a given value of o) remains geometrically 
similar, independent of the crack length,” the configuration of the 


crack can be specified by crack length alone. Therefore, Eq. (1) 
can be written as 


oz, y) = f(oo, z, y, 9, l, complete stress-strain relation of 
the material) (2) 


The stress-strain relations depend on the intrinsic properties of 
the material as well as on the stress or strain history. Upon re- 
peated loading different points in the specimen experience dif- 
ferent stress or strain histories. Therefore, the stress-strain rela- 
tions will be different for the material at different points in the 
sheet. Without the exact solution to this stress analysis problem, 
as well as an exact description of the behavior of the material, it 
is impossible to specify the stress-strain relations of the material 
of the whole specimen with one set of parameters. 

For the purpose of analysis, divide the specimen into rows and 
columns of small square elements of side b. The squares can be 
made as small as necessary to insure that the stress-strain rela- 
tions are the same for all of the material in each element. The 
complete stress-strain relations for the entire specimen are the 
aggregate of the stress-strain relations for each of the small 
squares which constitute the specimen. 

For any material, the stress-strain relations can be written as 


doy; = (3) 


where o,, and €,, are stress tensor and strain tensor, respectively. 


? This assumption will be discussed in detail later in this section. 
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The value of H;,,; depends on the previous stress and strain his- 
tory experienced by the material. Therefore, the stress-strain 
relations of any square can be specified by a set of parameters: 
Hap; Hap, The subscript a and specify the 
position of the square element in terms of a’th column and §’th 
row. All of these parameters have the dimensions of o. The 
complete stress-strain relations of the material of the whole speci- 
men can be specified by M sets of such quantities where M is the 
total number of squares. 

Written in the form of dimensionless quantities, Eq. (2) be- 
comes 


Has 


b Hwa 


To study the similarity of two specimens, consider a “‘model’’ 
and a “prototype’’ of the sheet of material containing a crack. 
For complete similarity, the model laws require that the stress- 
strain relations of homologous squares® be identical, if o» is the 
same for both model and prototype. In addition, the following 
conditions also must be satisfied: 


= Foz, (5a) 
at points Pi(z, y,) and P2(z2, y2) such that 


| 
wih sb (5) 


0, = 6; (5e) 


where subscripts 1 and 2 denote model and prototype, respec- 
tively. 

Equation (5) states that, if the nominal stress a) is the same 
for both model and prototype, and if the stress-strain relations 
of all homologous squares are the same, then the state of stress at 
any two homologous points is the same. The assumption that 
the homologous squares experience the same stress or strain 
history was made, and presupposes that condition of identical 
stress-strain relations for material for homologous squares. If 
Al, and Al, are homologous increments of crack length, i.e., if 
they satisfy the condition 


(6) 


then the number of cycles of load, AN, to propagate a crack 
length Al, and Al, must be the same if the mechanism of 


crack propagation is the same for both cases. Therefore, Eq. 
(6) can be written as 


Al Al, 
(7) 


Now consider the subscripts 1 and 2 as two stages in the course of 
crack propagation through one specimen rather than crack propa- 
gation in two similar specimens.‘ Equation (7) describes the 
basic law of crack propagation in a thin semi-infinite sheet. 

Dropping the subscripts and writing Eq. (7) in differential and 
integrated form give, 


3In general, there is a point-to-point correspondence between a 
model and its prototype. Two points that correspond to each other 
are homologous. 

* Consideration of two stages in the course of crack propagation in 
one specimen necessarily introduces different stress histories for the 
two stages. The significance of this limitation will be discussed later. 


Transactions of the ASME 


(xy) 
| P(x,y) 
Hap . 


as (8a) 


and 
Inl — Ink = C(N — No) (8b) 


In the course of the derivation of Eq. (8), two assumptions were 
made. The shape of the crack for each value of a) was assumed 
geometrically similar independent of the crack length, and the 
stress-strain relations of homologous squares were assumed to be 
identical. The solution, Eq. (8), and the assumptions are valid 
only if they are compatible with one another. 

It can be shown that the assumption of geometrical similarity 
of the shapes of crack is contingent upon the condition of identical 
stress-strain relations for homologous squares [12]. 

It can also be shown according to Eqs. (8) that two homologous 
points at two stages of crack propagation remain homologous as 
the crack propagates [12]. Therefore, two homologous points 
experience increments of stress and strain history that are identi- 
cal during cycle intervals of the same size. Therefore, two 
homologous points have identical stress-strain relations at the 
end of a cycle interval if the stress-strain relations were identi- 
cal at the beginning of the cycle interval. Increments of identical 
stress and strain history can be traced either backward or for- 
ward for homologous points. 

The one remaining consideration which determines the com- 
patibility of the initial assumptions and the final solution is the 
influence of previous stress history. 

Consider the complete stress history of two homologous points, 
P,; and P:, at two stages, 1 and 2, of crack propagation. The 
crack length at stage 1, after N, cycles, is |, and at stage 2, after 
N2 cycles, is where N; > N; and >1,. During the interval 
from the first load cycle until a crack is initiated at No cycles, 
both points, P; and P2:, experience the same stress history,® since 
the original specimen configuration remains unchanged. 

Assume for the moment that the stress-strain relations for P; 
and P; are identical at N; and Ny. Now tracing the stress history 
backward from N, and Nz for P; and P2, respectively, it is clear 
that P, and P, experience the same stress or strain history for 
(Ni — No) cycles just prior to N; and Nz. Therefore, P: ex- 
periences additional stress or strain history during the interval 
from the Noth cycles to the — — No) cycle in addition 
to that experienced by P;. The condition of compatibility of the 
assumption and the solution requires, for the material at point P2, 
that the stress-strain relations remain unchanged during this 
cycle interval. To satisfy this condition, it is evident that all of 
the material must exhibit a stable response to cycles of stress, a 
(and slightly higher) following the Noth cycle. For sufficiently 
low values of o> it may be anticipated that the condition will be 
satisfied. For some high value of o, as compared with the 
properties® of the material, the stress-strain relations for any 
point P, will continue to undergo change even after No cycles. 
Thus, it appears that the realm of validity of Eq. (8) may be speci- 
fied in terms of a dimensionless ratio involving the applied stress 
go and some material property representing the response of the 
material after No cycles. 

It is concluded that, within the limitation imposed by the in- 
fluence of stress history as outlined above, the assumptions made 
in the foregoing derivation are compatible with the final solu- 
tion, and Eq. (8) is the law of crack propagation in a thin sheet of 
infinite width under repeated loading of a given amplitude and 
mean stress. In the derivation, no assumptions were made con- 
cerning the mechanism of crack propagation, the stress or strain 


‘ This is not applicable to points that are within the influence of a 
geometrical discontinuity employed as a crack starter. 

* Yield strength, proportional limit, strain-hardening coefficient, 
etc. 


Journal of Basic Engineering 


distribution, or the material properties. Therefore, the solution 
is general in these three respects. 

For the over-all problem of crack propagation, the geometry, 
the mechanism of crack propagation, the stress or strain distribu- 
tion, and the material properties all must be considered. The 
geometry will determine the general form of the equation as given 
by Eq. (8); the mechanism of crack propagation, the stress or 
strain distribution, and the material properties will relate the 
parameter or parameters, such as C in Eq. (8), to the applied 
stress. Therefore, Eq. (8) is the partial solution of the over-all 
problem of crack propagation in thin sheet. The constant C in 
Eq. (8) will be correlated empirically to the stress amplitude and 
mean stress in this investigation. 

The solution, Eq. (8), is for a thin sheet of infinite width. For 
this configuration, the stress distribution remains similar at dif- 
ferent stages of crack propagation. For a specimen of finite width 
under constant amplitude repeated load, the stress amplitude as 
well as the mean stress increases as the net section of the speci- 
men is reduced by the propagation of the crack. Thus, appropri- 
ate modifications should be made. For a specimen of finite 
width, the stress distribution changes as the crack length in- 
creases, and the change becomes considerable as the crack ap- 
proaches a critical size. Thus, an exact solution for a sheet of 
finite width appears to be very difficult because for various stages 
of crack propagation the concept of similarity is not applicable 
and the exact stress distribution is unknown. 

Equation (8) was suggested without proof by Frost and Dug- 
dale [13] based on the premise of geometrical similarity of the 
crack configuration at each stage of propagation assuming elastic 
conditions at all points throughout the specimen. In the present 
derivation the limitation of elastic conditions was removed and the 
physical conditions limiting the applicability of Eq. (8) were 
stated. 


Experimental Investigation 
1 Material and Specimens 

Specimens were cut from 2024-T3 aluminum alloy sheets 0.02- 
in. thick. The mechanical properties were measured as follows: 


Yield strength (0.03 per cent offset) 
Yield strength (0.2 per cent offset) 
Ultimate strength 


51,500 psi 
53,500 psi 
69,500 psi 


Critical dimensions of the specimens are shown in Fig. 2. A 


0033 0 HOLE —— 4 


Fig. 2 Critical dimensions of fatigue specimen 
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circular hole, 0.033-in. in diameter and located at the center of the 
4-in. wide specimen, was used as a stress raiser to initiate the 
fatigue cracks. 

The specimens were buffed and hand polished by a standardized 
process to produce a uniform, smooth, and reflective surface 
across the entire width of the central region. 


2 Apparatus and Experimental Procedure 

The specimens were loaded axially using various combinations 
of minimum and maximum tensile loads to obtain a planned 
sequence of values of range of stress and mean stress. For each 
combination, except one, at least two specimens were tested under 
identical conditions to provide an indication of the reproducibility 
and scatter of the data. 

A schematic diagram of the experimental apparatus is shown 
in Fig. 3. The stress amplitude was controlled by an adjustable 
eccentric crank which loaded the specimen at 600 rpm. The ec- 
centric crank was connected to a loading beam by a turnbuckle, 
which controlled the mean stress. Both stress amplitude and 
mean stress could be adjusted easily and accurately. The over-all 
experimental error in the applied stress was estimated to be 
+200 psi. 

The crack length and the number of cycles of load were re- 
corded by time lapse photography at regular cycle intervals. 
The camera was controlled by an electromechanical system which 
performed two functions: (1) taking pictures at regular cycle 
intervals and (2) taking each picture when the stress was within 
90 per cent of the maximum stress in order to obtain maximum 
definition of the cracks on the film. This electromechanical 
system is shown schematically in Fig. 3. 

The crack length was measured by viewing the film through a 
microscope mounted on a calibrated traveling mechanical stage. 
The measurements, considering all sources or error, were esti- 
mated to be accurate to +0.001-in. 

All experimental procedures were performed in a consistent 
manner throughout the entire investigation. Special care was 
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Fig 3 Schematic diagram of test apparatus 


taken to maintain the maximum and the minimum loads con- 
stant during each experiment. 


Results and Discussion 


The experimental results and the analyses of data for all 
specimens are summarized in Table 1. The specimens were 
numbered so that the first two digits indicate the maximum stress 
in thousands of psi and the last two digits indicate the minimum 
stress. The stress range and mean stress are given in Table 1. 

In Fig. 4, a sample diagram of the crack length ? plotted on a 
logarithmic ordinate scale and the number of cycles of load N 
plotted as the abscissa are shown. The crack lengths were meas- 
ured from the tip of one crack to the tip of the other. The data 
for all 34 specimens indicate that crack propagation can be divided 
into three periods—the initial, the middle, and the final. 

The cracks propagated very irregularly in the initial period. 
This irregularity was probably caused by several factors including 
variations in the material introduced by machining the central 
hole, the slow propagation rate associated with a small crack 
length combined with microscopic inhomogeneities, and the dif- 
ficulty of measuring the crack length when the crack was small. 
Irregular crack propagation was observed in each of the three 
periods; however, it was considerably more sporadic during the 
initial period. In addition, the logarithmic ordinate employed 
tends to accentuate the irregularities of the short cracks and 
minimize the irregularities of the long cracks. The initial period 
occupied approximately one quarter of the entire propagation 
cycle life and ended as the crack length approached 0.07-in. 

As the crack propagated into the middle period, the variation 
of In! with N was linear as predicted by Eq. (8). The crack con- 
figuration presumably reached a “stable”? shape which was main- 
tained until the assumption of semi-infinite sheet was no longer 
valid. 

The propagation factor, C in Eq. (8), was evaluated from the 
slope of this straight line portion of the curve for each specimen. 
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Values of C for individual specimens as well as the average values 
Cave for specimens tested at the same stress range and mean 
stress are tabulated in Table 1. It was pointed out earlier that 
the propagation factor C is related to the applied stresses. In 
Fig. 5, Cavg is plotted as the ordinate, or a logarithmic scale, and 
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Fig. 4 Fatigue crack propagation diagram, 2024-T3 aluminum alloy 


the mean stress is shown as the abscissa. Straight lines have been 
drawn through the points which represent the same stress ranges, 
the same maximum stresses, and the same minimum stresses. The 
three families of equally spaced lines thus formed provide a very 
good representation of the trends exhibited by the data. 

The qualitative variation of C, with respect to the influence of 
stress on the rate of crack propagation, as indicated by these 
three families of lines, forms a consistent pattern of behavior. 
For a constant mean stress, C increases with an increase in stress 
range which may be achieved by increasing the maximum 
stress and simultaneously decreasing the minimum stress. Simi- 
larly, if the stress range is maintained constant, C increases with 
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Fig. 5 Correlation between crack propagation factor and mean stress 


Table 1 Analysis of fatigue crack propagation data for 2024-T3 aluminum alloy* 


Specimen Ago, 
number psi 
4002a 38,000 
4002b 38,000 
4002c 38,000 
4006a 34,000 
4006b 34,000 
4006c 34,000 
4010a 30,000 
4010b 30,000 
401l4a_ 26,000 
4014b 26,000 
4014c 26,000 
4014d 26,000 
4018a 
4018b 
3602a 
3602b 
3606a 
3606b 
3606c 
3610a 
3610b 
3614a 
3614b 
3202a 
3202b 
3202¢ 
3206a 
3206b 
3206c 
3210a 
3210b 
2802a 


.000313 
.000305 
.000250 
.000221 
.000191 
.000197 
.000151 
.000147 
.000129 
.000108 
.000124 
.000120 
.0000899 
0000820 
.000167 
.000166 
.000138 
.000135 
.000125 
.000124 
.000112 
.0000754 
.0000798 
.000102 
.000108 
.000104 
.0000857 
.0001046 
.0000792 
.0000702 
.0000630 
.000063 1 


000138 
.000120 
0000988 
000116 
.000110 
.0000855 
0000774 
.000151 
.000157 
.000127 
000125 
000118 
000114 
.000104 
0000720 
.0000754 
0000969 
.0001036 
.0000979 
0000784 
.0000968 
.0000736 
.000067 
.0000597 
.0000586 


.000133 
30, 000 
30,000 
26,000 
26,000 

2,000 
22,000 
30,000 
30,000 
30,000 
26, 000 
26,000 
26,000 
22,000 
22,000 
26, 000 
26,000 
22,000 


000118 
.0000776 
.000105 


-0000898 

-0000666 
.0000725 

2802b -0000818 . 0000757 

2806a -0000512 .0000512 .0000486 


(*) Terms used in Table 1 are defined in the Nomenclature. 


Journal of Basic Engineering 


Covave) 
.000264 


.000191 


.000143 
.000111 


(Nee 
— Np), 
cycles 
7,140 
7,010 
7,620 
8,870 
10,260 
9, 800 
13,870 
14,040 
18,120 
21,310 
17,680 
18,740 
26,700 
28, 890 
14,220 
13,550 


(N 
=N,), No 
cycles 


No, N se 

cycles cycles 
12,310 19,450 
10,170 17,180 
9,890 17,510 
13,390 22,260 
12,790 23,050 
11,440 21,240 
17,760 31,630 
19,400 33,440 
29,610 47,730 
26,560 47,870 
17,610 35, 290 
28,930 47,670 
: 61,500 
70,740 
36,980 
33,150 
40 , 900 
44,330 


Niy 
cycles 
21,200 
19,300 
20,920 
26,040 
27 , 390 
25,650 
36,340 
39,090 
53,070 
55,060 
41,910 
54,430 
69 ,450 
80,120 
40,190 
36,370 
46,210 
49 ,240 
45,660 
62,090 
52,470 
115,470 
88 , 760 
52,820 
56,700 
62,330 
72,190 
69,470 
79,570 
118,625 
115,148 
109,770 
81,310 
164,470 


Nin — No 
0.803 
.768 


57,830 
45,630 
107 , 360 
82,590 
46,030 
52,100 
56,350 
63,910 
63,850 


109 ,920 

103 , 880 

100, 880 

75,280 

103,550 151,290 


47,740 60 ,920 
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% 
20 
| 
6 
4 
| KTS 4 
6 4 
4 
3 
I7 i9 23 25 27 29 
om, 
psi Cc Cum Co 
21,000 -000289 .000285 8,890 
21,000 .000278 9,130 
21,000 .000230 11,030 0.691 
23 ,000 .000203 .000208 12,650 0.701 
' 23,000 .000180 14,600 0.703 
23 ,000 .000185 14,210 0.690 
25,000 .000149 ~=.000147 18,580 0.747 
25,000 19,690 0.713 
27 ,000 .000120 23,460 0.772 
27,000 28,500 0.748 
27,000 24,300 0.728 
27,000 25,500 0.735 
29 ,000 .0000860 .0000815 34,650 0.771 
29,000 38,270 0.755 
19,000 .000167 000154 17,430 0.816 
19,000 16,770 0.808 
.000123 16, 200 21,510 0.753 
16,830 21,740 0.774 
22,600 23 ,060 
.000109 37,390 20,440 24,700 0.827 
25,370 20, 260 27,100 0.748 
.0000737 74,300 33 ,060 41,170 0.803 \ 
49,450 33,140 39,310 0.843 
.0000995 24,700 21,330 28,120 0.759 ; 
30,400 21,700 26,300 0.825 
34,520 21,830 27,810 0.785 
.0000829 36,300 27,610 35,890 0.769 
40,400 23 ,450 29,070 0.807 
79,570 38, 250 
.0000634 35,770 44,475 0.804 
38 , 380 49,648 0.773 
.0000672 39,130 48,020 0.815 
31,140 37,170 0.838 
.0000486 0.784 | 
. 
. 


an increase of mean stress, which may be achieved by increasing 
both the maximum stress and the minimum stress. 

The middle period of crack propagation covered about one 
half of the propagation life, and a range of crack lengths from ap- 
proximately 0.07-in. to 0.16-in. 

The rate of crack propagation accelerated rapidly during the 
final period. This rapid increasing rate of propagation was due 
to two factors—the increase in stress range as well as mean stress 
due to the reduction of cross-sectional area, and a change of 
stress distribution at the tip of the crack as the ratio of crack 
length to specimen width increased. 

Modification of Eq. (8) for the increasing stress range and mean 
stress will now be discussed in detail. In the discussion of the ex- 
perimental investigation, it was pointed out that the maximum 
and the minimum loads remain constant until the very last part 
of the test. Therefore, the stresses increased as the cross-sec- 
tional area reduced, and the stress range and the mean stress are 
explicit functions of crack length [12]. The data in Fig. 5 indi- 
cate that the propagation factor is a function of stress range and 
mean stress. Therefore, the propagation factor in turn can be 
written as a function of crack length. The modified relationship 
between the crack length and the number of cycles of loading 
can be obtained by substituting the propagation factor in terms of 
crack length into Eq. (8a) and integrating. 

In Fig. 5 the small slope of the equi-stress-range lines indicates 
that the stress range, Ac, has a greater influence on the value of C 
than mean stress.? Therefore, in Fig. 6, C,y, is plotted as the 
ordinate, on a logarithmic scale, and the stress range is shown on 
the abscissa. The solid lines are equi-maximum-stress lines, and 
the broken straight lines are the traces of the change of C due to 
reduction of cross-sectional area. The slopes of the broken lines 
are very nearly constant. Therefore, using an average slope, 
0.000123, to represent all of the broken lines, the propagation 
factor C can be expressed at any stage of the propagation, in 
terms of crack length by taking into consideration the change of 
stress range with respect to crack length. 

By using an average slope for the broken lines and describing C 
in terms of stress range only, the effect of the second stress 
parameter was neglected. The expression of C thus obtained is 


(9) 
where 
D = 0.000123Aa, 


Substituting Eq. (9) into Eq. (8a), rearranging terms, and in- 
tegrating gives 


1 
(N M) = & {— E,[-(p — D)] + E,[-(p — D)}} 
eD 
— — + E{—p)} (10) 
0 


where 


D 


L 


and the subscript ‘0’ indicates the initial condition. The term 


—E,—z) in Eq. (10) is the exponential integral of argument z, 
and is defined as 

’ The relative magnitude of the slopes of equi-stress-range, equi- 
maximum-stress, and equi-minimum-stress lines in Figs. 5 and 6 in- 
dicates the relative influence of these four stress quantities on the 
propagation factor. Arranged in the order of decreasing influence, 
they are: stress range, maximum stress, mean stress, and minimum 
stress [12]. 
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= f 


The values of this integral are tabulated. 

Equation (10) represents a modification of Eq. (8) for applica- 
tion to a specimen of finite width. The effect of the continuously 
changing stress range and mean stress caused by crack propaga- 
tion are included; however, it does not account for a change in 
the stress distribution as the ratio, 1/L becomes large. 

The quantities No and Cy will now be evaluated for the purpose 
of comparing propagation as predicted by Eq. (10) with the ex- 
perimental data. 

The initial condition in Eq. (10) is represented by l) which was 
arbitrarily chosen to be the diameter of the hole and was equal to 
0.033-in. The quantity N is the value of N corresponding to 
and may be estimated by extrapolating the straight line portion 
of the curve in Fig. 4 back to lb. This graphical method of de- 
termination of No is shown schematically in Fig. 4 and is equiva- 
lent to an algebraic solution for No using Eq. (8). The values of 
N, obtained for each specimen are tabulated in Table 1. 

The calculation of the propagation factor C was based on the 
assumption that the specimen is infinitely wide. However, the 
specimen is of finite width, and the propagation of the crack is 
influenced by the increase in stress accompanying crack propaga- 
tion. This effect is reflected in the values of C measured as the 
slope of the line representing the cxperimental data during the 
middle stage of propagation. Therefore, this value of C is not 
the propagation factor for a given stress range and mean stress 
in a semi-infinite sheet. However, a first approximation to the 
propagation factor C, for a semi-infinite sheet at the initial values 
of stress range and mean stress will be made. Equation (10) may 
be rewritten solving for Cy, and evaluating for the fracture con- 
ditions, i.e.,/ = Land N = N,,, giving 


(11) 


— D)| + e?E{—p)} 


C 
Ny, — No 


(12) 


If N,,, the theoretical number of cycles at fracture, were known, 
Cy could be calculated. To determine N,,, it is convenient to 
form the ratio, (VN — No)/(Ny, — No), which will be called the 
propagation cycle ratio. The relationship between (NV — No)/ 
(Ny, — No) and (l/L) can be calculated using Eq. (10) since the 
quantity Cy does not appear in the expression for the propagation 
cycle ratio. From this relationship, N,, can be calculated for 
each specimen using the previously determined values of No and 
one additional experimental point. The points used in this 
calculation were chosen from the linear portion of the curves be- 
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cause, in this region the assumption of semi-infinite sheet is still 
valid, and the effect of a change of stress distribution can still be 
neglected. The values of N,, thus obtained, provide all of the 
necessary information to evaluate Cy using Eq. (12). 

The initial propagation factor Cy for every specimen and the 
average for specimens with same stress range and mean stress 
are tabulated in Table 1. The average values of Cy are plotted 
against the mean stress in Fig. 7. The experimental points are 
well represented by straight lines similar to Fig. 5. Based on this 
diagram, the value of Cy) can be related empirically to Ago and 
Ome as 


In Cy = 0.0000930 + 0.0000391om0 — 12.61 (13) 


Substituting Eq. (13) into Eq. (10), the crack propagation rela- 
tion becomes 


(N — No) = —E,[—(p — D)] + — 
— e?(—E—po) + (14) 


where 
= 12.61 — 0.0000930Aq) — 0.000039 Lomo 


The relation between the crack length | and the number of 
cycles of load N can be calculated either by using Eq. (10) with 
individual C, for each specimen, or by using Eq. (14), which 
comprises not only the statistical scatter of individual C, from 
the average value but also the deviation of the average value from 
logarithmic relationship of Eq. (13). 

The result of calculation for specimen 3614b by Eq. (10) is 
shown in Fig. 4 as the broken curve. The theoretical curve 
coincides with the experimental results during the middle stage of 
crack propagation. In the final stage, the theoretical curve lies 
to the right of the experimental curve. However, the deviation 
is small and consistent. The ratios of the experimental to the 
theoretical propagation lives, (N,, — No)/(Ny, — No), are tabu- 
lated in Table 1 for each specimen. The values ranged from 
0.69 to 0.84, and the average was 0.77. 

The result of calculation by Eq. (14) is shown as a broken curve 
in Fig. 8 and the solid curves represent the experimental data for 
various specimens denoted by the lower case letters. The 
theoretical curve in Fig. 8 shows the same general characteristics 
as the one in Fig. 4. The ratio, (N;, = No)/(N,.’ — No), thus 
calculated has an average value of 0.76 with a range of 0.66 to 
0.94. 

The crack propagation formula for a finite sheet, Eq. (10), was 
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Fig. 8 Combined fatigue crack propagation diagram 


derived from the crack propagation formula for a semi-infinite 
sheet, Eq. (8), but assuming Eq. (9) for the expression of the 
change of the propagation factor with respect to an increase of 
crack length. From Eq. (10), Eq. (14) was derived by assuming 
Eq. (13) as the relation between the propagation factor and the 
stress range and mean stress. Therefore, Eqs. (10) and (14) are 
valid only for the material used in this investigation while 
Eq. (8) is valid for any material. Additional experimental in- 
vestigations are needed to determine whether expressions similar 
to Eqs. (10) and (14) are applicable for other materials. How- 
ever, the general approach of adjusting C for change of stress as 
the crack propagates can be applied to other materials. 

The values of No, the number of cycles for crack initiation de- 
termined from the experimental data by extrapolation of the 
straight line to , as shown in Fig. 4, were tabulated in Table 1. 
These values of No, because they are defined in an arbitrary 
manner, are fictitious; however, they serve as a useful measure 
of the initiation period. The statistical scatter of the initiation 
period was investigated by determining the parameters of the 
statistical distribution of the ratio, No/Nommean) assuming a nor- 
mal distribution. The quantity Nomean) is the mean value of No 
for specimens subjected to the same stress range and mean stress. 
The standard deviation of the ratios was 0.127. 

The propagation life can be specified in terms of (N,;, — No). 
In a similar manner, the ratio (Ny, — No)/(Ny. — No)mean CAN 
be formed, where (N,, — No)mean is the mean value for specimens 
subjected to the same stress range and mean stress. The stand- 
ard deviation was computed as 0.052. As shown by the standard 
deviations, the scatter of the initiation period is more than twice 
as large as that of the propagation period. This is not unexpected 
since the phenomena of crack initiation is more localized a 
phenomenon than crack propagation. 

Photomicrographs of the crack tips both before and after 
polishing and etching, are shown in Fig. 9 at a magnification of 
200 X. Opposite tips of the same crack are shown in Fig. 9(a, b, 
and c). Fig. 9(b and c) show the same tip of the crack before 
and after polishing and etching, and Fig. 9(a) shows the opposite 
tip of the same crack after polishing and etching. The same tip 
of another crack before and after polishing and etching is shown 
in Figs. 9(d and e), and the tip of a third crack is shown in Figs. 
9( f andg). These pictures were taken from three specimens each 
subjected to the same stress range, 30,000 psi, and the same mean 
stress, 21,000 psi. The over-all crack lengths are given below each 
group of the pictures. 

It may be observed in Figs. 9(a, b, and c) that one side of the 
crack is longer than the other, and two small cracks were present 
at the shorter side. The stress at the tips of these two small 
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Fig. 9 Photomicrograph of fatigue crack tips (X200) 


cracks will be less than the stress at the opposite tip. Conse- 
quently, the crack will propagate faster on one side than the other. 
If the stress is sufficiently reduced, the two short cracks may be- 
come dormant until the other end of the crack has propagated 
enough to increase the stresses and cause one of these two small 
cracks to propagate further. One such example is shown in Fig. 
10. The crack propagated on only one side until 111,490 cycles 
were applied. This phenomenon produced two linear portions of 
the crack propagation curve. The intersection of these two lines 
indicates the beginning of crack propagation at the other side of 
the hole. 

Occasionally the cracks split into several branches as shown in 
Figs. 9(e and g). Branching has the effect of reducing the stress 
at the tip of crack. This may be the cause of the frequently ob- 
served hesitation periods. One such example of a hesitation 
period is shown in Fig. 11. 

The size of the plastic zone at the crack tip increased with crack 
length as shown in Figs. 9(b, d, and f) by the slip lines. No 
visually detectable plastic deformation was present in Fig. 9(a). 
However, the plastic zone in Fig. 9(f) is much larger than that 
shown in Fig. 9(d). This observation is in qualitative agreement 
with the assumptions made in the derivation of Eq. (8). 
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Concluding Remarks 7 
The major results of this investigation may be summarized as 5 
follows: 
4} + 
1 An expression was derived for the crack length / in a semi- 
infinite sheet undergoing repeated loading in terms of number of 88 100 2 24 136 


cycles of loading N and a stress dependent propagation factor C. -3 
2 The experimental results indicated that the propagation kenber of cycles - NXIO 


life can be divided into three periods—the initial, the middle, and Fig. 10 Fatigue crack propagation diagram, 2024-T3 aluminum alloy 
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the final. In the initial period, the crack propagated sporadically 
and slowly. In the middle period, measurements of crack propa- 
gation agreed with the prediction of Eq. (8); this period con- 
sumed from 40 to 50 per cent of the propagation life. In the final 
period, crack propagation was greatly accelerated due to two 
effects: An increase in stress range an mean stress as the net 
section was reduced and a disproportionate increase in the size 
of the plastic zone which changed the stress distribution and the 
mechanism of fracture. 

3 The crack propagation relation for a semi-infinite sheet was 
modified to account for the effect of an increasing stress range and 
mean stress as the crack propagated in a finite specimen. The 
propagation life predicted by the modified equation was a constant 
proportion of the experimental propagation life. Therefore, 
accurate prediction of propagation life appears possible. 

4 The propagation factor C in Eq. (8) was found to depend 
on the applied stresses. The relationship between propagation 
factor and the stress range Ao and the mean stress ¢,, was de- 
termined experimentally for 2024-T3 aluminum alloy. 

5 The scatter of the length of crack initiation period was 
found to be more than twice as large as that of propagation period. 
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6 The photomicrographs indicated that the size of the plastic 
zone at the tip of a crack increases as the crack propagates. 
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Fracture of Flat and Curved Aluminum 
Sheets With Stiffeners Parallel to the Crack 


The mode of crack propagation and failure in relatively large 2024-T 3 aluminum sheets 
reinforced with stiffeners parallel to the crack direction has been investigated. Curved 
specimens with a 69-in. radius of curvature as well as flat panels were subjected to 
uniaxial tension perpendicular to a simulated crack to study the effects of curvature, 
crack location, and stiffener spacing. Increase in strength due to stiffening particu- 
larly in the curved panels was observed although these specimens exhibited considerable 
lower crack strength than flat ones. For the specimens tested, crack location as well as 
variations of stiffener spacing from 3 to 12 in. had no appreciable effect on either critical 
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Introduction 


is stupyY of crack propagation and fracture in en- 


gineering structures involves the consideration of a large number - 


of parameters. Because of this difficulty and to obtain some basic 
answers many investigations concerned with the formation and 
propagation of cracks in metals have been performed on flat 
sheets or plates subjected to simple tension. Based on the 
Griffith equation [1]! many analyses and experimental investiga- 
tions, too numerous to list, were conducted to determine a pro- 
portionality constant for critical fracture conditions. Using flat 
tension specimens containing cracks, Sorensen [2] and others 
[3, 4, 5] showed that the stress at crack instability varies in- 
versely with the square root of crack length. Investigating the 
effect of some of the structural parameters on crack propagation, 
Peters and Kuhn [6] described curvature effect in cylinders with 
slits while Irwin [7] recently showed the influence of plate 
thickness on crack propagation and Romualdi, Frasier, and 
Irwin [8] have made an analysis of the rivet force influence when 
a crack propagates perpendicular to stiffeners riveted to an 
aluminum sheet, as encountered in aircraft structures. 

This paper describes the experiments and results of a phenome- 
nological study of the crack growth and failure conditions in 0.040 
in, thick 2024-T3 clad aluminum sheets with riveted J-stiffeners 
parallel to an existing crack as illustrated in Fig. 1. It was of in- 
terest to observe the effect of the spacing of stiffeners having two 
different thicknesses, namely, 0.032 in. and 0.072 in., as well as 
location of the crack relative to the stiffeners. The three 
crack locations in the aluminum sheet were (a) halfway between 
stiffeners, (b) in the rivet row of the centrally located stiffener, 
and (c) immediately adjacent to the center stiffener. In addition, 
two curved and stiffened specimens containing a crack and having 
unrestrained circumferential edges were subjected to air pressure 
thereby obtaining critical hoopstresses for comparison with the 
critical tensile stresses in identical flat specimens. From previous 
experiments the crack propagation characteristics in flat and 
curved 2024-T3 aluminum sheets with centrally located cracks 
have been reported for various initial crack lengths [4,5]. In order 
to compare the results of this investigation with those of unstiff- 
ened specimens the curves of gross-area stress versus crack length 
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crack length or failure stress. 


for flat and curved 0.040-in. thick aluminum sheets are shown in 


Figs. 4(a and b), respectively. Since in this particular investiga- 
tion the initial crack length was held constant at*z) = 6.5 in. in 
the flat sheets and 2 = 8 in. in the curved ones, Figs. 4(a and b) 
show the estimated failure stress for the same initial crack length 
in the unstiffened specimens. The values thus obtained have been 
used to show the effects of adding stiffeners, their spacing and 
location relative to the crack, as well as the effect of curving the 
sheets to the 69-in. radius. This curvature was selected for the 
purpose of obtaining some basic answers on fracture in aluminum 
when used in such aircraft structural configurations as large 


fuselages. 


Experimental Technique 


The sheet material for the specimens used in this investigation 
was 2024-T3 clad aluminum of 0.040 in. nominal thickness. The 
flat specimens of width B = 36 in. were 83 in. long including the 
grip sections, thereby allowing for an effective length/width ratio 
of 2. The curved specimens of width B = 40 in. had a radius of 
curvature of 69 in. and a circumferential length of 58 in. In both 
groups of specimens the grain orientation of the material was in 
the length direction, namely, the direction of the applied stress. 
The initial crack shown in Fig. 1 was simulated by making a cut 
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with a 0.008 in. jeweler’s saw. The nominal initial length z of 
the crack was 6.5 in. for all flat specimens and 8.0 in. for the 
curved specimens, thus the resulting 2/B ratios of 5.5 and 5, 
respectively, were sufficiently close to permit comparisons. The 
stiffeners, which were fastened to the sheets with °/,:-in. counter- 
sunk aluminum rivets were also made from 2024 T-3 clad alumi- 
num sheets. These standard stiffeners of thickness 0.032 and 
0.072 in. had the dimensions shown in Fig. 1. The test condi- 
tions and results of unstiffened flat and curved sheet specimens 
with cracks propagating under uniaxial tension have been re- 
ported previously [4, 5] and the curves in Fig. 4 show the initial 
crack growth and failure conditions. The crack extensions during 
slow propagation were measured with a 0.01-in. least count scale 
mounted on the specimens. Increments of crack length were re- 
corded as the load or air pressure was increased to a maximum 
after which the crack became self-propagating. 
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Fig. 2(b) 


Fig. 2 Flat crack propagation specimen with J-stiffeners (a) before and 
(b) after failure in a 400,000 Ib tensile test machine 


The crack propagation tests for the flat specimens with stiff- 
eners were conducted in a 400,000 lb tensile testing machine with 
the load applied through special jigplates. A typical specimen 
mounted in the machine before and after failure is shown in Figs. 2 
(a and b). 

The curved specimens which are shown in Fig. 3 before and 
after failure were bolted into a special fixture. To provide pure 
hoop-tension in the sheet, the circumferential edges of the speci- 
mens were free to move during the application of air pressure to 
the inside surface, as illustrated in Fig. 1. Loading of the speci- 
mens in increments of 0.5-psi air pressure was controlled with a 
regulating valve and pressure measurements were recorded on a 
mercury manometer. Considering the specimen size and radius of 
curvature, the gross area stress was calculated as the hoopstress 
by the equation ¢ = p R/t where p is the air pressure in psi, R the 
radius of curvature (69 in.), and ¢ the sheet thickness (0.040 in.). 

After completion of the tests, standard tensile tests were per- 
formed on coupons made from the sheets near the grip sections of 
each of the specimens. 
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Fig. 3 (b) 
Fig. 3 Curved crack propagation specimen with J-stiffeners (a) before and (b) after 


failure in experimental air pressure jig 


Results and Discussion 


To obtain a comparison between the results of this investiga- 
tion and the data based on tests of unstiffened 0.040-in-thick 
aluminum sheets the crack propagation curves for flat and 
curved unstiffened sheets are shown in Figs. 4(a and b), respec- 
tively. For an initial crack length of z» = 6.5 in. in the flat sheet 
of Fig. 4(a) the complete failure is estimated at a gross-area stress 
of 25.5 ksi when a critical crack length of 8.5 in. is reached. A 
similar estimate for z = 8.0 in. in a curved sheet, as shown in Fig. 
4(b), gives a gross area stress of 15 ksi at failure with a correspond- 
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ing crack length of 8.7 in. The effect of curvature resulting in 
lower crack strength and less crack growth during slow propagation 
is clearly demonstrated in these curves as well as the data shown 
in Table 1. 

Increase in the failure stress to 23.8 ksi is shown in Figs. 5 and 
6 for curved sheets which have been stiffened parallel to the crack 
with 0.072 in. thick J-stiffeners at a spacing of 6 inches. The in- 
stantaneous length of a crack located in the rivet line of the center 
stiffener has been plotted in Fig. 5 for a curved specimen subjected 
to air pressure in the jig shown in Fig. 3. The test data of a simi- 
lar specimen, with the crack located halfway between the two 
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Fig. 4 Initial crack growth and failure stresses versus instantaneous Fig. 6 Mode of crack propagation in curved and pressurized 2024-T3 
crack length in (a) flat and (b) curved, 0.040-in. 2024-T3 clad alumi- clad aluminum sheet with 0.072-in. J-stiffeners. (Crack location is half- 
num sheets way between stiffeners at S = 6 in. spacing.) 


Table 1 Specimen configurations and test data 
— Gross-area stress—~ 
J-stiffeners——. ——Crack length—— Start of Max 


Spec. Configuration and Spacing, Thickness, Initial, Max load, crack load, 
No. crack location in. in. Xo, in. in. growth, ksi ksi 
1 Curved sheet, crack 
between stiffeners 6.0 0.072 8.00 10.6 12.9 23.8 
2 Curved sheet, crack in 
rivet row 6.0 0.072 7.96 9.16 13.8 23.8 
Flat sheet, crack in 
rivet row 6.0 0.072 6.53 8.5 18.0 28.0 
4 Flat sheet, crack ad- 
jacent to stiffener 6.0 0.072 6.54 8.14 18.75 32.3 
5 } 3.0 0.032 6.50 9.99 20.8 34.7 
6 4 3.0 0.072 6.53 9.56 22.2 32.2 
7 te. 6.0 0.032 6.53 9.39 20.1 32.9 
6.0 0.072 6.53 10.38 18.0 32.6 
9 peceteeenes | 9.0 0.032 6.51 9.65 21.5 32.8 
10 9.0 0.072 6.48 10.82 20.8 32.6 
11 | 12.0 0.072 6.54 10.38 21.5 31.9 
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Fig. 3 Effect of stiffener spacing (S = 3 in. to 12 in.) on fracture mode of flat 2024-T3 clad aluminum sheets with a crack halfway between 0.032 


in. and 0.072 in. J-stiffeners 


center stiffeners, are shown in Fig. 6. Considerably less crack 
growth can be observed in Fig. 5, where the crack grew into the 
nearest rivet holes and failure at a hoop-stress of 23.8 ksi took 
place because of reduced net area as well as local buckling of the 
sheet in the rivet row. It is of interest to compare the specimens 
just discussed with equivalent flat specimens for which experi- 
mental results are shown in Figs. 7(a and 6) and which are listed 
as specimens 3 and 8 in Table 1. In the case of the crack in the 
rivet row the flat specimen failed at 28 ksi as compared to the 
23.8 ksi failure stress in the curved sheet. However, in the flat 
specimen the crack also grew into the nearest rivet holes without 
further extension, as shown in Fig. 7(b). A more pronounced dif- 
ference can be observed between Figs. 6 and 7(a) which show a 
failure stress of 32.6 ksi in the flat specimen against 23.8 ksi in the 
curved one, the difference of 8.8 ksi being sufficiently large to be 
considered in curved alum‘*num sheet structures. From the 
tensile coupon tests for the curved specimens an average tensile 
yield strength o,, = 50 ksi at 0.2 per cent offset and an ultimate 
tensile strength a7, = 66.4 ksi were observed. The reduction in 
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strength therefore is one where due to the crack the curved speci- 
mens failed at 47.7 per cent of their yield strength or 35.9 per cent 
of their ultimate strength. These strength reduction ratios when 
compared with those of Fig. 11 are found to be lower not only 
for stiffened but also unstiffened flat specimens, again illus- 
trating the considerable effect of curvature. 

The mode of crack propagation in flat sheets is illustrated in 
Fig. 7 for specimens 3, 4, and 8 of Table 1, all of which have 0.072 
in. J-stiffeners located at 6 in. intervals parallel to the crack. In 
addition to having the crack in the rivet row ov halfway between 
stiffeners similar to curved specimens, a specimen with the crack 
immediately adjacent to the center stiffener was tested to failure 
and the results are shown in Fig. 7(c). The existence of a crack 
in the rivet row of a stiffener constitutes a potentially more criti- 
cal condition. As shown in Fig. 7(b) both the maximum gross 
area stress and the stable crack growth are less than for cracks 
existing in sheet areas away from the stiffener attachment. How- 
ever, the restraint of the stiffener on crack growth can be noticed 
since the crack adjacent to the stiffener, Fig. 7(c), grew to a 
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Fig. 9 Max gross-area stress versus stiffener spacing for flat 2024-T3 
clad aluminum sheets with 0.032 in. or 0.072 in. J-stiffeners 


length of 8.14 in. while the one 3 in. away reached a critical length 
of 10.6 in. as shown in Fig. 7(a). 

The pattern of crack propagation for various distances between 
stiffeners is shown in Fig. 8 for spacings of 3, 6, 9, and 12 inches. 
Although 0.032 and 0.072 in. thick J-section stiffeners were used, 
no appreciable variations in propagating characteristics are 
noticeable. The curve of Fig. 9 shows a very slight decrease of 
maximum gross area stress at failure as the stiffener spacing is in- 
creased to 12 inches. However, the critical crack length remained 
essentially the same regardless of the spacing as indicated by the 
curve in Fig. 10. The over-all effectiveness of stiffening is demon- 
strated in Figs. 9 and 10 by the increase in strength as well as 
crack length at which initiation of catastrophic failure will take 
place. 

The material properties of the aluminum sheets, namely, ten- 
sile yield strength at 0.2 per cent offset as well as ultimate strength, 
were again used for determining the strength reduction due to a 
crack. The ratio of maximum stress at failure to yield strength 
Omax/ Typ as & function of stiffener spacing S is plotted in Fig. 11(a) 
while versus S is shown in Fig. 11(6). 

The increase in efficiency due to stiffening a sheet with an exist- 
ing crack is shown by the increase of the strength reduction ratio 
from 50.7 to 65 per cent of the yield strength and from 38.5 to 
50 per cent of the ultimate tensile strength of the aluminum sheets. 

It is recognized that the numerical results are unique with re- 
gard to the selected specimen geometries and test conditions. 
However, it is believed that, in aluminum structures of the type 
discussed, the propagation characteristics of cracks running paral- 
lel to the stiffeners will remain essentially similar to those of this 
investigation. 


Conclusions 


1 Crack propagation tests of 0.040-in. 2024-T3 clad aluminum 
sheets show that specimens curved to a 69-in. radius will fail at a 
uniaxial tensile stress of 15.0 ksi for an initial crack length/sheet 
width ratio of 5 as compared to a 25.5 ksi failure stress for a flat 
sheet with an initial crack length/sheet width ratio of 5.5. The 
critical crack lengths at those maximum gross-area stresses were 
essentially the same for both types of specimen. 

2 A crack running parallel to stiffeners riveted to a curved 


aluminum sheet will cause complete collapse of the structure at a — 


tensile stress lower than those encountered in similar flat alumi- 
num sheets. 

3 The existence of a 8.0-in. long crack parallel to the stiffeners 
in a curved aluminum sheet caused a reduction in strength which 
was 47.7 per cent of the yield strength or 35.9 per cent of the ulti- 
mate tensile strength of the sheet material. 

4 The location of a crack parallel to stiffeners riveted to a 
flat aluminum sheet does not appreciably affect its mode of propa- 
gation. Cracks in the rivet row or immediately adjacent to it 
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Fig. 11 Maximum crack propagation failure stress in relationship to (a) 
tensile yield strength and (b) ultimate tensile strength as a function of 
stiffener spacing in flat 2024-13, clad aluminum sheets with 0.032 or 
0.072-in. J-stiffeners 


exhibited less crack growth than those further away from a 
stiffener. 

5 The maximum gross-area stress at the time of complete 
failure due to a crack running parallel to stiffeners on a flat 
aluminum sheet increases only very slightly as the spacing of 
those stiffeners is decreased from 12 to 3 inches. 

6 For the specimens in item 5, the critical crack lengths at 
onset of collapse were essentially the same for stiffener spacings of 
3, 6, 9, and 12 inches. 

7 With an initial crack of 6.5-in. length, 36-in. wide 2024-T3 
clad aluminum sheets failed at 50.7 per cent of yield strength 
or 38.5 per cent of ultimate tensile strength. For the same 
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specimens the existence of J-stiffeners parallel to the crack in- 
creased the strength to 65 per cent of yield strength or 50 per cent 
of ultimate tensile strength regardless of the spacing of the 
stiffeners. 
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theories. 


Theories of Fracture 


Under Combined Stresses 


Experimental data are presented for a cast nodular iron and for high-silicon cast 
iron materials which represent limiting conditions of ductility in a test of fracture 
Data of other pertinent investigations are reviewed and various fatl- 
ure theories are discussed with regard to their applicability. 
under combined stresses of brittle materials can be predicted adequately by applying a 
notch modified distortion energy criterion. 


I‘ is concluded that failure 


However, other criteria are also applicable. 


Since criteria for the failure of brittle materials are conspicuously phenomenological, 
the application of solid state and dislocation theory, to explain the initiation and growth 
of a crack in a heterogeneous structure under combined stresses, has been discussed. 


Introduction 


ihiecsalieaiion amount of information is now availa- 
ble on the behavior under combined stresses of brittle materials 
such as cast iron. The pertinent work is summarized in Table 1. 
An examination of the work done leaves a distinct impression as 
to the thoroughness of the ear‘ier investigators. Present theories 
may permit a generalization on fracture of brittle materials under 
combined stresses and indicate limitations of the theories. 

The fracture of brittle materials such as glass, prestressed 
concrete, and cast iron, under combined stresses is actually a 
problem of considerable practical importance. Design with these 
materials has been relatively empirical in the past and safety 
factors have occasionally been less safe than nominally indicated. 


Experimental Procedures 


The present investigation has tested theories of brittle fracture 
by using a brittle material such as high-silicon cast iron and a 
relatively ductile material such as nodular iron. 

The investigations of cast iron under biaxial stress have uti- 
lized tubular specimens subjected to internal pressure and axial 
tension or compression. This type of specimen is optimum be- 
cause it permits the obtaining of various states of combined stress 
with simplicity. Most of the investigations, referred to in Table 
1, utilized an incremental method of loading which permits 


stressing according to a desired ratio of axial to tangential stress, 
sometimes referred to as “radial loading.’’ For pure compression, 
investigators have also used solid cylinders and short tubes. 

In the present investigation, specimens were stressed along pre- 
determined stress ratios until failure occurred. Internal pressure 
was measured by a calibrated bourdon-type gage accurate to plus 
or minus '/; per cent. Axial loading of the specimens was done on 
a 200,000 pound Southwark-Emery Testing Machine, using the 
10,000 pound scale with a least division of 10 pounds. 

Tube and solid cylindrical specimens were machined from 
vertically cast 1.25-in. diameter 18-in. long transverse test bars 
all from the same foundry heat. Dimensions of the specimens for 
the nodular iron are shown in Fig. 1. 

High-Silicon Cast fron. The specimens used were initially re- 
ceived in the form of 1.25-in. diameter, 18-in. long transverse test 
bars. The specimen detail is shown in Fig. 2. 

The testing of the high-silicon cast iron specimens presented 
difficult problems in machining and in test procedures. 

Many of the tubes machined were found to be porous. In the 
process industries where “Corrosiron,”’ “Duriron,’’ or other high- 
silicon irons are used, this is no problem because on brief ex- 
posure to corrosives a corrosion product is developed which seals 
the pores, particularly since heavy sections are used. In this in- 
vestigation, the porosity was handled by impregnating the speci- 
mens inside and out, under vacuum, with a low viscosity solution 
of lucite acrylic plastic in ethylene dichloride. 


Table 1 Selected static biaxial stress investigations 


Investigator 


Date Reference Specimen D,/Di Material Stress Quadrant 
Cook & Robertson 1911 [1] Thick wall tube closed end 1.30-2.96 Cast iron T-T 
Ros & Eichinger 1926 2 Thin wall closed end 1.20 Cast iron, Stl T-T 
1929 13 
Siebel & Maier 1933 [4 Thin wall 1.18 Stl, brass T-T; T-C 
Grassi & Cornet 1948 {5 Thin wall 1.12 Gray c.i. T-T; T-C 
Coffin 1949 16 Thin wall 1.10 Gray c.i. T-T; T-C; C-C 
Cornet & Grassi 1953 (7 Thin wall 1.12 Inoculated iron T-T; T-C 
Bresler & Pister 1955 [8 Thick wall 1.50 Concrete T-C 
Cornet & Grassi 1956 {9 Thin wall 1.33 Nodular iron T-T; T-C 
Clough & Shank 1956 [10 Thin wall 1.10 Nodular iron T-T; T-C 
Clough & Shank 1956 {ll Thin wall 1.10 Gray cast iron T-T; T-C 
Cornet & Grassi 1956 [12 Thick wall 1.40 Silicon iron T-T; T-C 
D, = outside diameter, in. 
Di = inside diameter, in. Microstructure. As may be seen from Fig. 3, the microstructure of 
by = tension—tension quadrant the nodular iron investigated is normal, and the graphite is in 
-C = tension—compression quadrant 
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nodular or spheroidized form. In the gray cast iron [5]! and 
calcium silicide inoculated iron [7] previously investigated the 
graphite was in flake form. The high-silicon cast iron has graphite 
flakes and sharp needle-like structures. Although the properties 


1 Number in brackets designate References at end of paper. 
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Fig. 2 High-silicon iron test specimen 
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Fig. 3 Photomicrographs, nodular, high-silicon, and inoculated cast 
iron 
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of the inoculated iron met the specifications of ASTM Class No. 
50 [13], the graphite flakes of this inoculated iron were but little 
smaller in size than the graphite flakes of the gray cast iron, and 
possibly were not representative of a normally fine grained inocu- 
lated iron. 

Analyses. Speciiications of cast iron normally subordinate 
chemical composition to physical properties [13]. Analyses of 
some of the materials discussed are presented in Table 2. 


Table 2 Composition of materials studied, per cent 


Cast iron Cc Si P Ss Mn 
Nodular 2.30 1.92 0.052 0.017 0.36 
High silicon 1.15 15.87 0.042 0.010 0.65 
Inoculated 2.84 2.95 0.180 0.087 1.38 
Gray 3.48 2.41 0.310 0.140 0.52 
Results 


The following series of figures show the experimental data which 
are sufficiently complete to permit evaluation of existing theories. 
Fig. 4 contains data on gray cast iron, Grassi and Cornet [5], 
L. F. Coffin, Jr., [6], Clough and Shank [11], and inoculated iron, 
Cornet and Grassi [7]. In Fig. 5 are the data for nodular 
iron, Cornet and Grassi [9], and Clough and Shank [10]. In con- 
trast to the data for high-silicon iron, Cornet and Grassi [12], and 
concrete, Bresler and Pister [8], shown in Fig. 6, data for nodular 
iron are presented in Fig. 7 showing the fracture theories. All 
these data appear consistent and complete enough to justify 
their use in the interpretation of exact fracture theories. This is 
somewhat remarkable in view of the range of the tensile strength 
(10,000-90,000 psi) and compression strength (75,000-150 000 
psi). 


— 


STRESS 


Fig. 4 Fracture stress for various stress ratios, gray cast iron and in- 
oculated iron 
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Fig. 5 Fracture of nodular iron under biaxial stress 


Discussion 


Fracture of Nodular Iron. All failures observed of the nodular 
iron started by local plastic deformation, the cracks originating 
presumably from some small nuclei, with the state of stress acting 
as the determining factor for the propagation of failures. The 
fracture pattern changed with various ratios of axial stress to 
tangential stress, here referred toasn. Forn = +” ton = +1 
failures propagated transverse to the axial load. At stress ratios 
n = +0.75 and +0.50 failures propagated in an axial direction 
perpendicular to the tangential stress. At the stress ratio n = 0, 
and in the compression quadrant to n = —© the axial tension 
component of stress is absent; hence the failure which initiated 
merely relieved the tangential stress by forming a small crack, 
tear, or hole. Since there was no sudden increase in axial load 
there was no axial crack propagation. 

The nodular iron investigated had an average tensile strength 
of 60,200 psi, based on four tubes tested. The average compres- 
sive strength was 77,200 psi + 10 per cent based on nine tubular 
specimens. The high tensile strength of nodular iron is given 
much emphasis in the literature, but it is seldom if ever noted 
that the compressive strength of nodular iron is relatively low. 
Possibly the superior ductility of nodular iron permitted enough 
plastic deformation in compression to introduce tension stresses 
at the bulges and cause tensile failures at the bulges. This ex- 
planation implies, of course, that the true stress at fracture is not 
known or is indeterminate, as indicated by Clough and Shank 
{10}. While bulging in pure compression is considerable, it is 
believed that the true stresses are known to within +5 per cent 
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Fig. 6 Fracture stress of high-silicon iron and concrete under biaxial stress 


accuracy. There was relatively little scatter of results in the ten- 
sion-tension quadrant for the nodular iron, as shown in Fig. 5. 
Fracture of High-Silicon Iron. High-silicon iron tubes under biaxial 
stress fractured with negligible ductility for all ratios of axial to 
tangential stress. Under tension stress the tubes broke transverse 
to the axial load. For the stress ratios +1 and +'/, brittle 
blowout occurred, with some fragmentation. Tubes with in- 
ternal pressure (tangential tension) and compressive loading 
failed by the formation of longitudinal cracks, often three or four 
longitudinal cracks per tube. In pure compression, solid cylinder 
and tubular specimens broke into many small grains and pieces. 
The high-silicon iron had an average tensile strength of 10,700 psi 
and an average compressive strength of 74,800 psi. This gives a 
ratio of compressive to tensile strength, K = 7.0, as compared to 
K = 1.28 for nodular iron and K ~ 3.2, for gray cast irons. For 
comparison it is interesting to note that for mild steel K = 1, 
for glass and ‘Griffith material’? K = 8.0, and for concrete K = 
10. Consequently, considering this criterion, the high-silicon iron 
does not behave like metal but approximates a Griffith ma- 


terial. 


Theories of Failure 


General Limitations. Various theories have been proposed to 
account for failure of materials under biaxial stresses [14, 15]. 
The more commonly used and best known theories have as- 
sumed a material which is homogeneous and isotropic. These 
theories are often formulated in terms of principal stresses, but 
the significant stress is the mean shearing stress at a particular 
point in the body where failure occurs. This mean shearing stress 
is the octahedral shear stress, which is proportional to the distor- 
tion energy and also to the effective stress; therefore these three 
criteria with the constant factors removed cannot be discriminated 
one from another in application. 

Corrections for inhomogeneity and Anisotropy. Inhomogeneity and 
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Fig. 7 Fracture theories for nodular iron 


anisotropy are present in many engineering materials. Theories 
of failure which were proposed to account for the behavior of 
isotropic and homogeneous materials must be modified to be 
reasonably applicable to such materials as cast iron, concrete, or 
glass. 

Cast iron can be considered as a material composed of a matrix 
of carbon steel with interspersed graphite. Under comparatively 
low tensile stresses the graphite flakes act as sharp notches which 
grow to cause brittle failure. Under high compressive stresses 
cast iron sustains loads equal to those of carbon steel. This model 
proposed by Grassi and Cornet [5] has been formulated by J. C. 
Fisher [6, 17]. The ratio of fracture strength in tension to frac- 
ture strength in compression has been designated as a stress con- 
centration factor K, equivalent to the effect of an ellipsoidal 
cavity, of appropriate geometry, in the material. Equations 
predicting failure of homogeneous isotropic materials are simply 
modified by multiplying tensile stresses by K. Detailed explana- 
tions and examples have been published (J. C. Fisher; Cornet 
and Grassi; et al.) [17, 9,6, 10,11]. 
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Griffith Theory. The Griffith theory [18, 19] is particularly de- 
signed to account for failure in brittle materials, and it postulates 
that microcracks, which are present in the materials, will grow 
when the free energy of the system is reduced by the growth of 
the cracks. The surface energy increases and the strain energy 
decreases as a crack grows. Griffith’s formulas for failure of a 
brittle material with minute flaws are 


(a) If 3 8, + S; > 0, S; = So 
(b) If 3 S; + S; 0, (S; S;)? + 8 So(S, S3) = 0 


Here S,, S: S; are the principal stresses, and S) is a constant de- 
pending only on the properties of the material and the dimen- 
sions of the crack. The Griffith theory predicts a strength in com- 
pression eight times the strength in pure tension. 

Application of Failure Theories to Cast fron. Examination of data 
available on the failure of cast iron (Figs. 4-7) indicates that 
the maximum stress theory will not apply in quadrant II. The 
maximum shear theory, with its simplicity and ease of applica- 
tion may be a useful rough approximation of quadrants I and IT 
(Figs. 6, 7). The maximum strain theory fits poorly. The maxi- 
mum strain energy theory represents the experimental data well 
(Figs. 6, 7). The maximum distortion energy theory, the effec- 
tive stress theory, and the octahedral shear stress theory apply 
and since they differ from each other by constant ratios, these 
three theories cannot be distinguished one from another (Figs. 
6, 7). 

Failure of metal is stochastic in occurrence, but few investiga- 
tions of failure under combined stresses present sufficient data to 
permit statistical inference. Theories discussed here all depend 
on failure values determined experimentally in pure tension and 
in pure compression. Not only is there considerable scatter in the 
relatively few values given for pure tension and pure compres- 
sion, but there is also question, with some validity, whether pure 
compression actually was obtained for the investigations studied, 
since buckling failure may be involved, 

Under the circumstances it may be concluded that experimental 
data obtained for the fracture of cast iron under biaxial stresses 
are consistent with a stress concentration modified criterion of 
failure based on the maximum distortion energy theory or its 
equivalent effective stress or octahedral shear stress theories. For 
engineering purposes, failure can be predicted, under combined 
stresses at least in quadrants I and II, with the accuracy with 
which the results in pure tension and pure compression have been 
obtained. This stress concentration modified criterion is applica- 
ble to materials ranging from a brittle high-silicon cast iron of 
negligible ductility to as-cast nodular iron with sufficient duc- 
tility that there is a question whether buckling failure may not 
be involved in pure compression. Under buckling conditions the 
state of stress at failure is complex, unknown, and possibly inde- 
terminate. The agreements obtained are therefore remarkable. 

The Griffith theory is not applicable to metals, in general, since 
plastic flow energy is usually 100 to 1000 times as great as the 
surface energy (Ref. [15], p. 59). The shape of the fracture curve 
predicted by the Griffith theory resembles the fracture curve ob- 
tained for cast iron, but the ratio of compression to tension 
strengths, which would be 8.0 for a Griffith material, is less than 4 
for cast irons. The high-silicon iron, with a ratio of 7.0 closely 
approaches the curve for a brittle material predicted by the 
Griffith theory. 

Good approximations to the data may be obtained by assuming 
a normal stress criterion for fracture in the first quadrant and 
upper part of the second, and assuming a shear stress law prevails 
under compression loading (Ref. [15], pp. 72,74; [20]). Such as- 
sumptions seem somewhat arbitrary; also they fail to predict 
satisfactorily the locus of failure in the third quadrant for which 
there is some data [6]. 

The theory formulated quantitatively by J. C. Fisher [6 17] in 
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which 4 stress concentration factor K is utilized, employs for con- 
venience the maximum shear stress theory for failure in the first 
quadrant and the maximum distortion energy theory in the sec- 
ond quadrant (Fig. 4). Based upon this investigation, it may be 
concluded that the most appropriate criterion of fracture of 
brittle materials under combined stresses is this analysis, since 
it applies and is both conservative and convenient. 

Although the maximum strain energy theory approximates the 
experimental data (Figs. 6, 7), there is some question regarding 
its significance in terms of fracture stress for conditions of negligi- 
ble strain such as might occur under triaxial strcss or in ma- 
terials with negligible ductility. On the other hand, the maxi- 
mum distortion energy theory and its equivalents, effective 
stress theory and octahedral shear stress theory, remain 
physically significant as strain approaches zero. 

In the notch factor modified criterion for fracture of brittle 
materials, the distortion energy criterion may therefore be re- 
placed by an effective stress or by an octahedral shear stress cri- 
terion, if preferred. 

Variation in Material During Stressing. Early studies on the failure 
of brittle materials under combined stresses emphasized, perhaps 
unduly, the virtual elimination of necking which permitted 
greater assurance in determining the state of stress at fracture 
and the reduction of strain-hardening effects [5]. For materials 
such as high-silicon iron these advantages may still be present, 
but even a gray cast iron shows appreciable ductility under high 
compression stresses, and nodular iron shows strain-hardening 
effects. 

An even more serious problem arises from the behavior of an 
inhomogeneous structure on stressing. Examination of the 
yield envelope for pearlitic nodular iron [10] shows a material 
which yields at about 42,000 psi in axial tension, in axial com- 
pression, and in tangential tension. Such behavior characterizes 
a relatively isotropic material. As cast iron undergoes increasing 
stress the graphite undergoes separation and cracking, and there 
are pronounced orientation effects as cracks in the graphite orient 
preferentially transverse to the tension stress [11]. There is also 
an appreciable decrease in density, due to creation of voids, and a 
change in Poisson’s ratio. At fracture the material is quite 
anisotropic. Thus a comprehensive theory must start with yield- 
ing of an initially isotropic material, work-hardening and develop- 
ment of structural anisotropy with plastic deformations, and final 
fracture of an anisotropic material. 

It has been shown by Clough and Shank [10] that even under 
strains as small as '/2 per cent in tension, nodular iron decreases 
in density, due to voids which are created by the separation of 
thin shell of graphite, at the matrix boundary, from the rest of the 
graphite nodule. Hence it is not surprising that analyses of bi- 
axial stress fracture based on correlating stress with strain, plastic 
work, or similar functions have yielded families of curves, rather 
than a single curve. Indeed it has been stated that structure 
sensitive properties are not amenable to simple singular mathe- 
matical solutions [21]. 

Limitations of Existing Theories. There is at present a great need 
for an approach to the problems of fracture under combined 
stresses based on dislocation theory. There are different kinds of 
fracture, such as cleavage and shear [15, 21]. Fracture is known 
to be a two-stage process involving, first, the nucleation of a crack, 
and second, the growth of a crack. Under the influence of stress, 
dislocations move. Groups of dislocations pile up at barriers 
such as grain boundaries and induce local tensile stresses which 
approach the theoretical cleavage strength of the material. 

Studies by Wadsworth and Thompson [22], and by Harries and 
Smith [23], have shown that under cyclic stress at low stress 
cracks nucleate and develop on the surface, because flow occurs 
successively on slip planes that intersect each other near a free 
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surface. Under static tensile stress, ductile metals fracture first 
near the center of the cross section of tensile test specimens, 
rather than on the surface [24, 19, 15]. Since ductile metal ten- 
sile test specimens show considerable necking prior to fracture, 
there may be combined stresses operating on the infinitesimal 
volume where the crack nucleates and propagates. 

In a heterogeneous microstructure such as that of cast iron 
it should be easy, theoretically, to find nuclei for crack initiation 
where graphite meets ferrite at the metal surface or within the 
structure. There is a great physical change from an iron to a 
carbon matrix at the interface, and there must be inherently a 
great concentration of stress locally, due to residual stresses sub- 
sequent to solidification, since there are wide differences in ¢o- 
efficient of thermal expansion, modulus of elasticity, and duc- 
tility of the two solids [25]. It is possible in some cases, such as 
in high-silicon cast iron, to conceive of microcracks being present 
at the surface and in the material due to these stresses on solidfi- 
cation of the melt; such alloys would then approach the behavior 
of a Griffith material in fracture. Nonmetallic inclusions in & 
metallic matrix may be regarded as microcracks, from the stand- 
point of stress analyses [15]. It would not be easy to distinguish 
between an actual microcrack and an inherent zone of weakness 
at the interface between a nonmetallic phase and a metallic struc- 
ture; for example, a heat-treatment sufficient to relieve gross 
residual stresses could not eliminate the mismatch at a boundary 
of two dissimilar structures such as hexagonal lattice graphite and 
body centered cubic lattice ferrite, nor could it affect welding at 
an actual microcrack. 


Conclusions 


This investigation has covered the fracture under combined 
stresses of materials ranging from relatively ductile nodular iron 
to brittle high-silicon iron. Data obtained from this study and 
from the pertinent literature have been evaluated with respect to 
the various fracture theories. Essentially the stress system in- 
volved may be related to, or formulated in terms of, an average 
shearing stress acting on an infinitesimal volume in the body of the 
solid. The effect of structural factors may be summed in terms of 
a notch concentration factor K based on hypothetical and fic- 
titious cavities within and at the surface of the material. Equa- 
tions may then be obtained which conform to the data available, 
as far as engineering purposes are involved and considering the 
statistical limitations of the data. 

It is not yet possible to examine the microstructure and chemi- 
cal analysis of a material such as cast iron and evaluate K, the 
stress concentration factor, from such data. It has not even been 
possible to look at the graphite flakes of two gray cast irons and 
therefrom estimate values of K, except qualitatively. 

It may still be helpful to continue phenomenological investiga- 
tions in order to obtain even greater assurance for engineering 
design purposes. Fortunately, the stress concentration factor K 
may be taken simply as the ratio of compressive strength to ten- 
sile strength, for the materials investigated. It is possible that, 
for structures such as plastic laminated glass fabric tubes, the 
stress concentration factor K may depend on the axis of reference 
selected. It is also possible that phenomenological studies of 
fatigue under combined stresses is also warranted, since the con- 
cept of a stress concentration factor K may be useful in the study 
of failure under cyclic combined stress. 

The term “brittle” has been used here largely in the sense that 
a brittle material can sustain without failure stresses of greater 
magnitude in compression than in tension. It is believed that the 
ultimate comprehension and analysis of the physical phenomena 
of fracture of such brittle materials must come from increased 
understanding and application of dislocation theory. 
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Dynamic Synthesis of Higher-Order, 
Optimum Saturating Systems 


A new technique for the synthesis of optimum switching criteria for higher-order 


saturating systems is investigated in this paper. 


Basically, the technique consists of 


generating a linear switching criterion that is equivalent to the optimum switching 
criterion for each state of the system. Thus as a trajectory of the system is traced out, 
the optimum forcing function is determined point by point, resulting in the optimum 
trajectory for the system. The synthesis procedure has been proved experimentally by 
simulation of a third-order system on an IBM 797 digital computer. 


Problem Delineation 


HE class of nonlinear problems to be discussed in 
the paper is described by Fig. 1 in which an ideal saturation 
constraint is placed on the input to a linear, time-invariant plant. 
The process to be controlled consists of a linear portion and a 
nonlinear portion, each independent of the other. The nonlinear 
portion has zero memory. The input to the process, m’(t), is 
controlled by a computing device which can be either continuous 
(analog computer) or discrete (digital computer) in character. 

The optimum system is defined as that system, subject to the 
saturation constraint on the input to the plant, which returns to 
equilibrium, = = .... = €,(¢) = 0, in miminum time 
from a given set of initial conditions, {€,(0)}. A restriction placed 
on the problem is that the set of input variables {r,(t)} belongs to 
the same class as the set of output variables {z,(t)}; i.e., the set 
{r<(t)} and the set {z,(t)} are both described by the same set of 
differential equations, and further that the forcing function 
for the set {r,(t)} is identically zero. Thus, within this framework, 
the set {e,(t)} represents a redefinition of the set {z,(t)} and the 
autonomous behavior of the system, = m(t) = .... = 7,(t) 
= 0, is sufficient to determine completely the dynamic charac- 
teristics of the system. From this point of view it is necessary only 
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to consider the autonomous behavior of the system from an arbi- 
trary set of initial conditions, {z,(0)}. 

The linear plant is represented by a set of n-linear differential 
equations with constant coefficients. In vector form, these equa- 
tions are: 

x(t) = Ax(t) + m(t)f (1) 


where 


x(t) = “state’’ of the plant at time ¢ 
A = n X n matrix describing the plant 
m(t)f = vector description of the forcing function 


The vector description of the dynamics of a linear plant follows 
conventional practice [1, 2]? and leads directly to the description 
of the dynamics of the plant in Euclidean n-space, e™, the state 
space. The solution of equation (1) is 


x(t) = G(t)x(0) + (2) 
where 
x(0) = initial state of the plant 


(At)? 


.... = impulse response 


matrix of the linear plant 


From equation (1) the equilibrium condition for the plant is 
equivalent to x(t) = 0. For this condition, equation (2) can be 
solved to yield 


? Numbers in brackets designate References at end of paper. 
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t 
—x(0) = Fa m(r )G( O<t (3) 
If equation (3) can be satisfied, then the system reaches equi- 
librium in time ¢ from an arbitrary initial state x(0) by applica- 


tion of the m(r) which satisfies equation (3). Addition of the 
saturation constraint modifies equation (3): 


—x(0) = ff <t (4) 


|m(r)| <1 


By definition, the optimum system is that system for which equa- 
tion (4) is satisfied for minimum ¢ for all x(0) in e«™. Thus the 
optimization problem is rephrased in terms of a minimization of ¢ 
in Eq. (4) for agiven x(0). Let this minimum ¢ be designated 4. 

Application of the technique presented by Bellman, Glicksberg, 
and Gross [3] to equation (4) shows that for the optimum saturat- 
ing system: 


|m(r)| = 1, OS tT <b, all x(0) 


i.e., the maximum of the forcing function is applied for all x(0)- 
Thus the optimum saturating system as defined here is an opti- 
mum contactor system. Further, for linear plants with real 
eigenvalues the maximum number of switchings of the optimum 
forcing function is given by (n— 1) switchings, where n is the 
order of the linear plant; this result is explored more extensively 
in [4], in which switching curves are developed by use of the Bell- 
man, Glicksberg, and Gross technique. 

For linear plants with complez eigenvalues, the maximum num- 
ber of switchings are dependent on both n and x(0) and applica- 
tion of the foregoing technique proves impractical for plants of 
higher than second order [4]. However, if x(0) is restricted to a 
region around the origin in e* such that there are (n — 1) maxi- 
mum switchings of the optimum forcing function, then the 
theory developed in subsequent sections in this paper is equally 
applicable to plants containing complex eigenvalues. Henceforth 
it will be assumed that, for plants containing complex eigen- 
values, x(0) is restricted as stated, thus allowing extension of 
the theory to linear plants with complex eigenvalues. 


Theory of Optimum Switching Surfaces 

The results of the preceding section can be used to develop the 
concept of optimum switching surfaces for the n-dimensional sys- 
tem with real eigenvalues. Under the assumption that equation 
(4) can be satisfied,* this equation can be rewritten for the opti- 
mum system in the following form: 


+ me G( —r)fdr m, 


O46 Sb 
—x(0) = m[ f° 


to 


*In most practical situations, equation (4) can be satisfied. If, 
however, the eigenvalues of the linear plant have positive real parts, 
then equation (4) can be satisfied only in a bounded region near the 
origin of «". Proof of this statement can be found in reference [2] and 
in Appendix 1, reference [4]. 
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|m,| =1 


0S4:56 <b 


Implicit in the expansions of equation (5) and equation (6) is the 
notion that the system reaches equilibrium from an initial con- 
dition, x(0), in (n — 1) switchings. For an nth-order system, a 
maximum of n-terms is needed on the right-hand side of equation 
(6) for specification of an arbitrary x(0) in e". 

We now define from equation (6) a sequence C,(t;), Co(t2), . 
C,,(t,) as follows: 


Ci(th) = m G( —r)tdr 0 < hy, all 
|m| = 1 (7) 
tz 


Cit) = — f G(—r)idr 


a1 


0 < < all th, t, 


|my| =] 


ti 


to 
Clty) = + f 


For x(O)eC,(t;), x(O)¢C:a(t:-1), and x(0) lies in the subset from 

which equilibrium is reached in (i — 1) switchings. We note that 

C,(t,) generates the Euclidean n-space, €", whereas C;(t;), 1 < ¢ 

<n, generates a subset of «*. Further, by this construction 

. . . . CC,(t,)Ce". Since for each 7, 1 < i < n, by 

definition C,(t;) NM = there exists a set 

{x,(0)} such that {x,(0)}eC,(t;) and {x,(0)}¢C:(t.). Thus the 

n-space is partitioned and a set {x,(0)} associated with each par- 

tition. 

Assume that x(0)eC,(t;), x(O)¢Cia(ti1). Then according to 

ti 

equation (7) + G(—r)tdr and 

equation (6) can be rewritten: 


—x(O) = x1 + (8) 


|m,| = 1 


where x;-.€C;-:(ti-1). By the method of induction it can be shown 
that the expansion expressed in equation (8) is unique for all #, 
1<ic<n._ Thus for any x(0)e™, equation (8) is unique and 
the optimum forcing function, sgn [m,], is uniquely determined. 

Equation (8) provides the basis for the construction of the op- 
timum switc!.og surfaces. For each x(0)eC,(t;), the subset 
Ci-:(t:-1) is constructed which divides C,(t;) into two parts. Thus 
a determination of which part of C;,(t;) describes x(0) suffices to 
determine sgn [m,]. A geometrical interpretation of the process 
is that each x(0) is constructed from a series of subsets each of 
which divides the next succeeding subset into two parts: If x(0) is 
located in the ith subset, then the (i — 1)th subset is used to de- 
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termine in which part of the 7th subset x(0) lies, hence to determine 
sgn 


Generation of Optimum Switching Surfaces 


The generation of the optimum switching surfaces of the pre- 
ceding section is accomplished by means of a logic developed from 
sampled-data techniques. Consider the linear sampled-data sys- 
tem in Fig. 2 consisting of a zero-order sample and hold coupled 
to a continuous plant described by the impulse response matrix, 
Gt). 

As in the preceding section, to satisfy the equilibrium con- 
dition of the system from an arbitrary x(0), the following equa- 
tion must be satisfied: 


t 
—x(0) = m(7 )G( —r)fdr (3-bis) 


The zero-order sample and hold unit of the foregoing system re- 
stricts m(t) to be constant over each sampling period 7’, so that 
m(t) is piecewise continuous and can be written: 


m, = m(kT) = m(t) (k-—1)T <t<kT (9) 
where k represents the kth sample of m(¢). Thus equation (3-bis) 
pecomes 


T 2T 
—x(0) = m, f, + G(—r) fdr+.... 


nT 
10) 
—x(O) = mbh(T) + mh(T) + ..... + m,h,(T) 


where the h,(7') are implicitly defined in equation (10). Under the 
condition that the linear plant has distinct eigenvalues and that 
operation of the plant is restricted to a region around the origin 
as outlined in the first section (for plants with complex eigen- 
values), it can be shown that the h,(7') are independent vectors 
in 

Since the n-independent h,(7') form a basis in n-space, a maxi- 
mum of n-terms on the right-hand side of equation (10) is needed 
to specify any x(0) in e™. The inversion of equation (10) gives 
the m, in terms of linear combinations of the z,(0), as in eq. (11): 


m = ay (T + + )z,, 
= Ay ( T )x; + + )z,, 


(11) 


m, = On( + One T + 


| x, (t) 
= 
$ x,t) 


Sample and Hold 
(t) 


Fig. 2 Linear sampled-data system 
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where the initial measured value of z,(0) is z,, and m, is the 
initial forcing function. Since it is possible for any x(0) = x to 
generate the m,, according to equation (11), for each x the initial 
forcing function, m, can be generated. Thus the feedback co- 
efficients of Fig. 2 are determined from equation (11) by inspec- 
tion and are ay(T), , a,(T). This procedure is 
similar to the procedure developed by Kalman and Bertram in 
[5]. 

If the output of the sample and hold is restricted by the 
idealized saturation characteristic of Fig. 1, then equation (10) is 
modified as in equation (12): 


—x(0) = mbh (7) + +...... + mh,(T) (12) 
<1 


The right-hand side of equation (12) represents a convex subset 
in €; (0) is restricted to this subset [4]. This subset increases 
with increasing 7’ in the sense that each Ih T)) and each com- 
ponent of h,(7') increases with increasing 7’. Thus, as 7 is in- 
creased, the subset expands from the origin of e in such a manner 
as to sweep out the region in which x(0) is located. 

To illustrate the principles involved in the generation of the 
optimum switching surfaces, we consider a second-order system 
with real distinct eigenvalues. According to the theory de- 
veloped in the preceding section the impulse-response matrix, 
G(t), determines the optimum switching curve. For this second- 
order system, only one term, C,(t;), is required from the se- 
quence of equation (7) to describe the optimum switching curve: 


= m fe G( 0<tallt (13) 


= 1 


The optimum switching curve is as shown in Fig. 3. The 
switching curve divides the two-dimensional space into two 
regions, each region specifying the optimum forcing function cor- 
responding to a given set of states of the system. 

The subspace indicated by equation (12) is now superimposed 
on Fig. 3. The subspace is constructed for fixed 7’ and forn = 2 
according to equation (14): 


—x(0) = m f, G( 
2T 
+ mf, G(—r)idr = + (14) 


<1 


Region 


SIL 11) \ 


»' 
Fig. 3 Optimum switching curve for typical second-order plant 
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Optimum Switching 
Curve 


Linear Switching Curve 


mz-1 Region 


Fig. 4 Superposition of linear subspace onto Fig. 3 


The relationship between the subspace and the optimum switch- 
ing curve is illustrated in Fig. 4. 

By construction h,(7’) and h,(7’) fall on the optimum switching 
curve as shown. Inspection of Fig. 4 reveals that the vector h,(7') 
establishes a linear switching criterion for the system in exactly 
the same fashion that the optimum switching curve establishes 
its optimum switching criterion; further, for points lying ex- 
actly on any boundary point of the subspace, the two switching 
criteria are identical. Since points on the boundary of the sub- 
space are identified uniquely by the expansion of equation (14), 
the optimum forcing function can be found directly by a deter- 
mination of sgn [m.]. In this manner, by defining for each x(0) 
a linear switching criterion exactly equivalent to the optimum 
switching criterion, the optimum switching curve is implicitly 
constructed. 

The optimum trajectory for this system is traced out point by 
point by adjusting the subspace for each x(0) such that x(0) lies 
on a boundary of the subspace. For each x(0) the optimum forcing 
function is found by measuring sgn [m,]. For the special case 
m, = 0 in equation (14), x(0) lies directly on the switching curve 
and is described by h,(7’); for this case, sgn [m,] uniquely deter- 
mines the optimum forcing function. 

These same ideas are extended to higher-order physical plants 
[4]. For higher-order plants, hyperplanes are establisd in n-space 
to provide the equivalent optimum switching criterion for each 
x(0). As a trajectory is traced out, x(0) successively passes from 
the kth switching surface to the (k — 1)th switching surface, 1 < 
k < n, according to equation (7). For each x(0) a linear subspace 
is generated such that x(0) lies on a boundary point of the sub- 
space. It can be shown that for x(0) lying on the kth switching 
surface the first k vectors of equation (12) suffice to describe x(0) 
for x(0) lying on a boundary point of the subspace. Hence the 
hyperplane formed from the first (k — 1) vectors of equation (12) 
provide the linear switching criterion. By induction from the 
second-order case, the linear switching criterion is equivalent to 
the optimum switching criterion. Thus a determination of 
sgn [m,] from the expansion of equation (12) gives the optimum 
forcing function, in accordance with equation (5), as 

sgn [m,] = [m] (15) 

From Fig. 4, it is apparent that regions near the switching curve 
can be described by two values of T satisfying all the foregoing 
conditions. Except for the case where x(0) falls directly on the 
switching curve, these two values of 7’ are stable values. The 
optimum switching criterion is the same for either value of 7, 
hence no special precaution is necessary in the implementation of 
this technique with respect to second-order systems. No similar 
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statement can be made for higher-order systems concerning the 
multiplicity of the values of 7’, although the higher-order system 
investigated in the section, ‘Experimental Results,’ presented 
no difficulty in this connection. 

The theory developed in the paper circumvented the problem 
of the construction of a binary switching function in n-space. 
The dynamic generation of linear subspace and the correspond- 
ing logical determination of the optimum forcing function provide 
a powerful technique for the implementation of the optimum n- 
dimensional saturating system. In the following section it will 
be shown how the technique is especially amenable to digital 
computation and hence provides appealing motivation for the 
use of high-speed digital computers as the computing device for 
this class of problems. 


Synthesis of Optimum Saturating System 


Implementation of the theory of the preceding section neces- 
sitates: 


1 Adjustment of the linear subspace by variation of T such 
that x(0) lies on a boundary point of the subspace. 

2 Provision of a logic for the determination of sgn [m] of 
equation (15). 


These operations are accomplished by a subspace generator and 
a sgn selector, respectively. Because of the highly nonlinear 
character of the operations, it is expedient to implement them by 
digital computation. Hence the presentation here is slanted 
toward the application of a digital computer for the implementa- 
tion of the technique, although, conceivably, the same operations 
can be performed by an analog computer. 

If x(0) is to be placed on a boundary point of the acer sub- 
space, it is necessary that equation (12) be satisfied and further 
that |m,| = | for at least one ¢, i = 1, 2, ,n. This statement 
can be verified geometrically [4]. For x(0) located within the linear 
subspace, |m,| < 1 for all i, i = 1, 2,...., n, in equation (12); 
conversely, for x(0) located outside of the linear subspace, |m,| >1 
for at least one i, i = 1, 2,....,. Since the convex subspace of 
equation (12) increases rom the origin with the parameter 7, it 
is possible to control 7, hence adjust the boundaries of the sub- 
space, so as to place x(0) directly on a boundary point of the 
subspace. A practical implementation of these ideas involves the 
simulation of a 7’, the generation of the m,’s, and a monitoring of 
the m,’s so as to control 7. Let the simulated T be designated 7’,. 

The subspace generator is conceived to implement these ideas. 
The subspace generator performs the following functions: 


1 Samples, with a period 7’, the state variables 72, ... ., 

2 Generates the a;,,(T,)’s of equation (11) from a simulated 

3 Generates the m,’s according to equation (11). 

4 Selects |mmax|, the largest |m,| from the set {m,}, then com- 
pares | with the saturation level, 1. 

5 Generates a new T,, from the difference (|mmax| — 1). 

6 Reduces (|mmax| — 1) < € by an iterative (feedback) pro- 
cedure on steps 2-5; € is a small positive quantity. 

7 Repeats the process for the next sample of the state varia- 
bles. 


The subspace generator, implementing this procedure, is il- 
lustrated for a second-order system in Fig. 5. The subspace 
generator, in toto, represents a complex, highly nonlinear device; 
however, as indicated previously, the individual operations are 
readily implemented digitally, thus placing a premium on the 
use of a digital computer as the simulating mechanism for the 
subspace generator. 

In a practical realization, all calculations in the subspace gen- 
erator are performed between samplings. The subspace genera- 
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tor tracks the physical system and determines pointwise the 
optimum trajectory, each point on the trajectory being separated 
in time by T seconds. At 7 — 0, the sampled system ap- 
proaches its optimum continuous counterpart. 

The sgn selector determines the optimum forcing function for 
each x(0). The logic for the sgn selector is also most conveniently 
implemented ona digital computer. A table of logic for the second- 
order system of Fig. 5 is as illustrated in Table 1. The parameter 
@ is a small positive quantity. Table 1 is constructed in ac- 
cordance with the theory developed in the section, ‘““Generation 
of Optimum Switching Surfaces.” 


Table 1 Table of sgn selector logic for system of Fig. 5 
Sgn selector logic 

a <1 
a > |m,| 


Forcing function 
m = —sgn [m,] 


m = sgn [m] 


The digital nature of the calculations for the subspace generator 
and the sgn selector necessitates the use of the two parameters, € 
and a. Both parameters, in essence, control a region in the 
vicinity of the optimum switching curve in which the switching 
criterion is identical to the criterion set up by the switching curve 
j4]. This second-order example indicates another feature which 
adds to the flexibility of the method: A decrease in the limit level 
effectively ‘“‘advances’’ the switching curve in space. Thus a 
decrease in the limit-level control advances the switching time for 
a given trajectory, hence compensates for any time delays in- 
herent in the subspace generator. This feature is of value in real 
time applications of the technique in which all calculations in the 
digital computer are performed between samplings. 

The ideas developed for the second-order system can be ex- 
tended immediately to higher-order systems. For higher-order 
systems, the basic implementation remains simple, but the re- 
quired computer facilities increase drastically. However, with 
the improved computer facilities becoming available today, no 
formal difficulty is envisioned in the implementation of the proce- 
dure to more complicated processes. 


Experimental Results 


To verify the theory developed in the preceding sections, a 
third-order system was simulated on an IBM-797 digital com- 
puter. The system, in Jordan normal form [1], consists of two 
integrations and one time constant as shown in Fig. 6. This 
system can be related to any other system having a plant with 
the same distinct eigenvalues by a simple transformation [4]. The 
equations necessary for the description of the system are 


1 @ 0 0 
Gt)=]0 1 O 1 (16) 
0 0 1 


The a;,(T,)’s for the system are determined from equation (17) 
and are as specified in equation (18), which is the inversion of 
equation (17): 


To 
=x(0) = mf." + me 


+m or, G(—r)idr (17) 


— 


27, — 1)? 


m 2 


—1)°" 


—a 
+ — 1)3 Xs 
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The subspace generator calculates the m,’s, then processes them 
as indicated in Fig. 6. The state variables are sampled with a 
period 7; 7, is simulated internally in the subspace generator. 
The logic governing the sgn selector is given in Table 2. 
The experimental procedure consisted of placing known initial 
conditions on each of the state variables, then observing the re- 
sponse of the system to these initial conditions. Typical results 
are illustrated in Figs. 7, 8, and 9. In each case the results sub- 
stantiated the theory and th: optimum trajectory was generated. 
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Fig. 8 Experimental response of system of Fig. 6 to initial condition 


Table 2 Table of sgn selector logic for system of Fig. 6 
Sgn selector logic 
<1 


Forcing function 


m = sgn [ms] 
as |m™s m = —sgn [m,] 


a< <1 


a < 
a < |m,| 
a<|m| <1 


m = sgn [m] 


It was found, as expected, that near the origin in n-space, in 
the region of small error, small limit cycles existed; these are a 
result of the sampled nature of the process. To eliminate these 
small limit cycles, the system switches its mode of operation in 
this region to an alternate type of compensation, normally a 
linear-finite-settling-time design. Thus the over-all system 
classifies as a dual-mode type system. A monitoring of 7’, of the 
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Fig. 9 Experimental response of system to initial condition 


subspace generator provides the criterion for switching from the 
nonlinear compensation to the linear compensation, with switch- 
ing occurring for 7, < n7. The points at which this switching 
occurs are indicated in the trajectories in Figs. 7, 8, and 9. 


Conclusion 


This study provides a technique whereby the optimum trajec- 
tory for the higher-order saturating system is determined compu- 
tationally. In the present study, the computer simulation on the 
IBM 797 of the optimum third-order saturating system gives 
very encouraging results. The ultimate goal of this study, of 
course, is to extend the dynamic synthesis procedure to higher- 
order, real-time applications, using a high-speed digital computer 
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as the compensating element of the system. Clearly, such ap- 
plications are of practical as well as scientific significance. 

The future course of investigation follows directly from this 
study: 


1 Practical application of the technique to higher-order, op- 
timum saturating systems; performance characteristics of these 
optimum systems. 

2 Implementation of the technique for the study of systems 
with complex eigenvalues. 

3 The effect on over-all system performance of introducing 
engineering approximations to the actual plant. 

4 Extension of the technique to other nonlinear processes. 

It is the hope of the author that this research has stimulated 
new ideas in the area of optimum saturating systems and that the 
ideas presented here provide impetus to the use of digital tech- 


niques to the solution of this and other related nonlinear prob- 
lems. 
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ferential equation, x 


Solution Space Approach to Optimal 
Control Problems 


In this paper, we study the control of the dynamic system governed by the matrix dif- 
= Fx + Du, x(0) = —c, where the input vector u is constrained 


in amplitude. It is shown that in the discrete (sampled data) case: (a) The general 
optimal control problem can be formulated as a nonlinear programming problem 
amenable to treatment by techniques developed in the operation research field. (b) The 
specific time optimal control problem originally studied by Kalman is treated here 


Introduction 


uRING the past few years, the problem of optimal 
control! of dynamic systems has received considerable attention. 
In general terms, the problem is the determination of con- 
trollable inputs to a system subject to various constraints such 
that the output of the system corresponds as nearly as possible to 
some desired behavior according to a specific criterion. Solutions 
to this problem have taken various directions. One very general 
and fruitful approach has been extensively cultivated by Bell- 
man, Kalman, and Bertram in this country [1-4].?_ This is the 
state-space, time-domain approach. A particular class of the 
general optimal control problem involves the minimization of 
time required to reduce a given dynamic system to rest from any 
arbitrary state, the so-called time optimal control problem. For 
the case of continuous systems, this problem was treated by 
Bushaw, Bellman, Bass, and LaSalle [5-8]. The last named 
reference also gives an account of the considerable volume of work 
by Krasovskii, Gamkrelidze, and Pontryagin in the USSR. For 
the case of discrete (sampled) systems, very little has been done 
besides the original paper of Kalman [9]. 

The purpose of this paper is twofold: (a) We wish to show that 
the general optimal control problem in the discrete case can be 
formulated as a problem in nonlinear programming, amenable 
to treatment by techniques developed in operations research. (b) 
By viewing the discrete time optimal control problem, as such, we 
are able to provide a different approach to the problem and 
derive certain additional properties concerning the solutions. 

It is to be emphasized at this point that despite the concerted 
effort of many workers in this field, no general solution in the 
practical sense has been found for the time optimal problem both 
for the continuous as well as discrete case. Similarly, this paper 
does not claim such results. What we shall endeavor to show 
here is a different approach to the same problem which displays 
certain particular characteristics of the solution and which may 
lead to practical solutions for certain cases. 


Notations, Terminology, and Nomenclature 


All upper case boldface Roman or Greek letters indicate mat- 


1 The term “optimal control’’ is used in a strict sense here to be de- 
fined later in the section. Existence of feedback as usually understood 
in the loose application of this term is not necessarily implied here. 

2? Numbers in brackets designate References at end of paper. 

Contributed by the Instruments and Regulators Division of Tae 
AMERICAN Society OF MECHANICAL ENGINEERS and presented at 
the Joint Automatic Control Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
June 6, 1960. ASME Paper No. 60—JAC-11. 
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using a different approach which yields well-known as well as new results. 


rices. All lower case boldface Roman or Greek letters indicate 
vectors. Scalars are denoted by lower or upper case letters. The 
absolute value of ¢ is denoted by |c|. When the || is applied to a 
vector or matrix, the absolute value of each element is taken; 
ie., < implies |u;;| 

The state of a system is a set of numbers, called the state 
variables, which contain as much information regarding the past 
history of the system as is required for the calculation of the 
future behavior of the system (definition, reference [4]). If a 
dynamic system is simulated by an analog-computer program in- 
volving n integrators, then the outputs of these integrators con- 
stitute a set of such numbers. The state space is an n-dimensional 
vector space with the state variables as co-ordinates. 

A list of important symbols follows for easy reference: 


state vector (n-vector) 

differential transition matrix 

input vector (m-vector) 

transition matrix for x (n X n matrix) 

desired behavior of state vector (n-vector) 

limitation on input vector 

step responses of system in one sampling period (n X m 
matrix) 

response matrix (n X k matrix) 

control sequence vector for mth input (k-vector) 

input matrix (n X m matrix) 


oF 


Problem Formulation 


General Optimal Control Problem. Consider the dynamic system 
described by the vector-matrix differential equation 


x = Fx + Du, x(0) = -—e (1) 
where 
x = state vector of n elements 
u = forcing vector of m elements where m < n 
F = constant n X n matrix 
D = constant n X m matrix 
n = order of the system 


Let d, the desired behavior of the state vector x, be known for 
all time. Then we are interested in the determination of u(t) such 
that, f(d — x), some scalar function of the error is minimized. 

The problem stated in this form leads invariably to solutions 
with unbounded u unless additional restrictions are placed on the 
system or the inputs. The constraints adopted here. which re- 
flect practical restrictions on available power are 


r 


t>0 (2) 


lu()| 
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Furthermore, in discrete vases u is piecewise constant 
wu) =u), iT <St< (3) 


where 7 is the sampling period which we shall henceforth set 
equal to one for simplification. 
The solution of equation (1) can be written [10] as, 


x(t) = —@(t) e+ Yi @(t — r)Du dr (4) 
where @(¢) is the solution of the matrix differential equation 
®=A%, 0) =! (5) 


For a list of the properties of ®(t) and its physical interpretation, 
the reader is referred to [4]. Since u is piecewise constant we 
can rewrite equation (4) after some manipulation as 


k-1 
x(k) = + Au(i), (6) 
1=0 
where 
® = 
k = t, 
and 
1 
f, @(—1)Dar (7) 


Equation (6) for different values of k can be summarized in one 
vector equation as 


where 
(1) 
x(2) 
X= nk X 1 matrix, 
x(k) 
C= nk X 1 matrix, 
| 
u(1) 
U = mk X 1 matrix, 
rr -1) 
and 
TA = 
DMA A 
2 
y SA A nk X mk matrix 


A 


Consequently, the general problem of optimal control can be 
stated as 


* Reference [4], pp. 415-416. 
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Minimize f(G — X) 
subject toX = C + yu 
and \U| <4, 


(P-1) 


where 


d(1) 
d(2) 
nk X 1 matrix 


6 
d(k) 


Problem (P-1) is a form of nonlinear programming with linear 
constraints. Computational techniques [11] exist for the solu- 
tion of this general problem under suitable conditions of the 
criterion function, f( G — X) (e.g., it is a convex function of its 
arguments). 

In cases where only one particular solution is desired for a given 
set of initial conditions (such as optimum programming of flight 
trajectories), this approach may be a very reasonable way of ob- 
taining numerical solutions especially when the constraints and 
criterion function are nonanalytic. Of course, when a general 
solution is desired, this method becomes rather inefficient. 

Time Optimal Control Problem. Often in practice, the control 
system is restricted by the additional constraint 


u(i) = for all i; (9) 


Un 


= 


i.e., control may be exerted at one input only. This is the usual 
case treated in conventional control systems. Furthermore, we 
may only be interested in the state of the system at instant k. 
Specifically, we may wish to find the minimum value of k such 
that d = x. In this case, the general problem reduces to the 
well-known time optimal control problem which takes this form: 


Determine the minimum value of k such that 
k-1 
d(k) = + (i) 
i=0 (P-2) 


is satisfied subject to the constraints 
Su, OSi<k 


where a is the mth column of A; u,,(i) is the mth element of 
u(i) fori < t < +1). 


Solution-Space Analysis 


A Geometrical Interpretation. The discrete problem as stated im 
(P-2) was first solved by Kalman [9]. In the solution-space ap- 
proach to be described, we shall solve the same problem in a 
different manner, and derive additional new results. Having 
developed the method, we shall extend it in the next section to- 
cases with several inputs and shall indicate problems associated! 
with nonstationary systems. 

For purposes of discussion, we shall make the additional, 
standard assumption that d = 0 fort > 0. The case whered 


can be handled similarly with increased complexity. Note that 
the vector equation in (P-2) can now be written as 
k—-1 
a 
1=0 
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| X=C + Ww, (8) | 


Equation (10) can be written in matrix form as 


c = Hu,,, (11) 
where 
and 
u,, (0) 
Uy, (1) 
u, = : (13) 
a — 1) 


Notice u,, is a vector whose elements are values of the mth element 
of u at different time instants. The optimal control problem (P-2) 
can be restated as follows: 


Find the minimum k such that 
(15) 


<M 
is satisfied 

Disregarding for the moment the restriction on the size of the 
controlled input, then in general, k = n, if the determinant of H 
is not equal to zero; that is, a linear system of order n starting 
from any initial state can be brought to the zero state in exactly 
n steps, if no limitation is placed on the control force. This is the 
same conclusion obtained by Kalman and Bertram from a dif- 
ferent consideration [2]. The requirement that det(H) = 0 is 
guaranteed if the plant is completely controllable [11] (see Appen- 
dix for a more detailed discussion). 

Now let us consider the system with the constraint (15). 
Consider a k-space with k mutually orthogonal axes 


Um, (O), Up (1), — 1) 


The solution of (14) with k = n represents a point P (0-flat) in 
this k-space which we shall call the solution space. The restriction 
lu,,| < m on the other hand defines a hypercube M centered 
around the origin in the solution space. If point P lies inside the 
cube, we have the completely linear case as described in the fore- 
going and the problem is solved. If not, the k must be made 
greater than n. Suppose we let k = n + 1. Then equation (14) 
defines a set of n equations with n + 1 unknowns. We can solve 
for 


(P-3) 


u,,(0), u,,(1), ..., — 1) 


in terms of u,,(n). Geometrically, this defines a line L (one-flat) 
in the (n + 1)-dimensional space. If the line pierces the cube M, 
then the line segment inside the cube certainly satisfies equations 
(14) and (15). Thus it is the solution to our optimal control 
problem. In fact, we see that there actually exists an infinity of 
optimal solutions (represented by points along the line segment) 
each capable of driving the system to the desired state in n + 1 
steps. On the other hand, if the line L does not penetrate the 
cube M, then we must resort to at least n + 2 steps. In this 
case, the solutions of equation (14) define a plane (two-flat) in the 
solution space. We again look for the intersection of this plane 
with the cube. If an intersection exists, the portion of the 
plane which lies inside the cube contains in general an infinity of 
points. Each of these points is a solution to (P-3). Proceeding 
in this manner, we can see intuitively that every time we in- 
crease the dimension of the solution space, we are increasing the 
chance for intersection as well as the dimension of the inter- 
secting space[a(k — n)-flat]. Under suitable conditions, to be dis- 
cussed later, an intersection will result if enough steps are taken. 

The fact that equation (14) indeed defines a(k — n)-flat in the 
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k-dimensional space can be seen if one rewrites equation (14) as 


follows: 
(1) H-'e + (i) 
tig(n = 1) | = 
Up, (n) (n) 
— 1) 4 u,,(k — 1) ol 


where H~ is the inverse of the first n columns of H. 

Equation (16) expresses a point in the k-dimensional space as 
a linear function of (k — n) arbitrary variables u,,(n), . . ., u,,(k 
— 1), and thus is the parametric equation of a (k — n)-flat in a 
k-dimensional space by definition of n-dimensional geometry 
{12]. It was also pointed out that the existence of solutions of ~ 
the type of equation (14) under more general conditions has been 
discussed by Penrose from a purely algebraic viewpoint in his 
recent paper.‘ 

Example. Consider a first-order system 


dz 
— =ar + u, 


z(0) = jul 


In the notations of (P-3), this is 


1 ! 1 1 
H = f e*dt!...1e7* f et at| 
0 0 


Since each element in the H matrix can be calculated explicitly 
we may simply indicate the matrix as, 


H = [h(k), h(k — 1),...,h(1)], 1 by & matrix 


Now, if no limitation is placed on u, then, 


However, if 


then we are forced to take at least two steps. From equation (14) 
we have 


c = h(2)u(O) + A(1)u(l) 


Viewed geometrically, one of the two situations illustrated in Fig. 
1 exists. In Fig. 1(5), an infinity of solutions exists as indicated 
by the heavy line segment inside the square. If the case is as 
shown in Fig. 1(a), no solution is possible and we must go to 3- 
space. This is illustrated in Fig. 2 where equation (14) can be 
written as 


c = h(3)u(O) + A(2)u(1) + A(1)u(2) 


The solutions, if they exist, are in a plane represented by triangle 
A-B-C. We have again an infinity of solutions. These facts can 
be immediately generalized to the following: 

Theorem. For the linear system of (P-3), if an optimal solution 
exists in a minimum of k steps where k > n, then there exists 
either a single solution or an infinity of solutions. 


4R. Penrose, ‘‘The Generalized Inverse of Matrix,”” Proceedings of 
Cambridge Philosophical Society, Cambridge, England, 1955-1956. 
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Fig. 1 Geometry of (n + 1) control steps 
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Fig. 2 Geometry of (n + 2) control steps 


Proof. The proof follows directly from the definitions of n-di- 
mensional geometry. Since equation (14) defines an infinite, con- 
nected subset of points, S, in the k-dimensional solution space, 
any solution that satisfies (P-3) involves an intersection of this 
subset S with the infinite connected subset M defined by the 
hypercube. This intersection contains either one point or an in- 
finite number of points, each of which is a solution. 

Existence of Solutions. So far we have shown an exhaustive and 
purely conceptual method of constructing solutions to the problem 
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of (P-3). There is no guarantee of the existence of solutions. 
Kalman first stated the existence theorem and derived a step-by- 
step method for constructing a particular solution of the problem. 
Roughly speaking, Kalman stated that for stable systems, a solu- 
tion exists for (P-3) while for unstable systems, a solution exists 
only for very limited cases. In terms of the solution-space ap- 
proach, we can obtain a simple interpretation of the theorem and 
method. 

Consider the last n columns of the matrix H. A little reflection 
will confirm the following two statements: 


1 The last n columns of H are linearly independent just as the 
first n columns are independent, and thus form a basis for an 
n-dimensional space. 

2 The matrix ®-* = @(—k) becomes unbounded when k—> 
for a stable system; i.e., for a system whose matrix F, equation 
(1), possesses no eigenvalue with positive real parts. Thus 
the last n columns of H span the whole n-dimensional space 
ask—> 


From the two foregoing statements, it is a simple matter to 
conclude that a stable system with input constraints can always 
be brought back to equilibrium from any initial state by adopting 
as control policy no control force for the first (k — n) steps and 
some appropriate control forces for the last n steps as k > ~. 
On the other hand, the reverse is true for an unstable system. 
If all the roots of the system have positive real part, then the 
matrix ®~* approaches zero when k — ©. In fact, the sum of 
all columns of H as K ~ H must remain finite since 


~* 


is finite. Consequently, Hu,, which is merely a bounded linear 
combination of the columns of H, equations (10) and (11), must 
also remain finite; i.e., 

Hu,, < © k— @ 


Thus equation (14) cannot be satisfied for arbitrary ¢. For 
every given 4, an unstable system can only be brought to rest if 
we start with sufficiently small initial conditions. Similar argu- 
ment can be made when only some of the roots have positive real 
parts. 

To visualize the construction of an optimal solution, we start 
by considering n control steps. The n-dimensional cube in the 
solution space (defined in the section, “‘A Geometrical Interpreta- 
tion’’) is transformable to the state space through the nonsingular 
matrix H. Since H is a linear transformation, its effect on the 
cube can only be that of rotation and rescaling. The hypercube 
M becomes a skewed cube in the state space. This skewed cube 
then represents the set of initial states reducible to zero in n con- 
trol steps. In particular, the 2* vertexes of the cube correspond 
to the 2* combinations of control steps where full control force is 
employed. To see the effect of the (n + 1)th step, we rewrite 
equation (14) or (10) as 

au, (i) = — @- (n) (17) 

1=0 
The left-hand side of equation (17) defines the skewed cube in the 
state space as described previously; i.e., the set of all states re- 
ducible to zero in n steps. On the right-hand side if we assume ¢ 
to be the set of all initial states reducible to zero in (n + 1) steps, 
then they can be found by displacing the surface of the skewed 
cube by the vector + ®-“*ay. The set of all points in state 
space determined this way defines the boundary of the set of all 
states reducible to zero in (n + 1) steps. Since any point in this 
set can be reduced by the ®-“+%au,,(n)-term on the right-hand 
side of equation (17) to a point inside or on the boundary of the 
skewed cube in one step, thus only n more steps are needed to re- 
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duce the system to zero state. Having determined the boundary 
of points in state space reducible to zero in (n + 1) steps, we can 
proceed in similar fashion to determine the boundary for (n + 
2) steps. This, of course, is the step-by-step procedure of Kalman 
derived earlier by a different method. 

Minimization of Control Energy. In many cases of interest, the 
square of the controlled variable u,, has the dimension of energy. 
It is often of interest to minimize the total energy spent. In the 
solution space this has a very simple interpretation. Among all 
the points on the intersection of the (k — n)-flat defined by equa- 
tion (14) with the hypercube, we must look for the point which is 
closest to the origin. Minimizing the distance in solution space 
is equivalent to minimizing total energy. Thus once the existence 
of a solution has been ascertained, we may wish to solve the ad- 
ditional problem 


k-1 
Minimize [> cnt)" such that 
1=0 
= 
< 


are satisfied and where k is the result of the minimum time prob- 
lem. This is a standard problem in quadratic programming. 
Straightforward solution techniques are discussed in [13] and 
will not be repeated here. 

An obvious possibility which presents itself is that of trading 
control energy for number of control steps. In other words, the 
criterion in (P-4) may be changed to 

k-1 
Minimize (u,(i))* + A(k) 
where k > kin, and \(k) = arbitrary weighting factor. 

Problems of this type can be handled by repeated solution of 
(P-4) with different k. In practical systems, n seldom exceeds 
10. An exhaustive search procedure is certainly within the 
capability of modern-day high-speed computers. 


Cases of Multiple Inputs and Time-Varying Systems 


Multiple-input Case. Consider the case where there are p 
inputs. Then instead of equation (14) we must now refer to the 
more general equation (6). Assuming again that d = 0, we have, 


k-1 
geP 1=0 
where 
a; = jth column of A 
P = subset of integer from 1 to m representing p controllable 
inputs 
A matrix equation similar to equation (14) can be written for 
equation (18). In this case, the resultant H-matrix is composed 
of several parts, one of which is the original H in equation (14). 
This is as follows: 


H,, H, 
. | os, | (19) 
There are kp variables, 
(0), .. — 1) 
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Disregarding for the moment the input constraint, we wish to 
select n variables from the total such that the maximum of their 
arguments (which indicate the number of control steps or time) 
isa minimum. Once this is done, we set the other variables equal 
to zero and solve for the n chosen variables in terms of ¢ and thus 
obtain the optimal solution. However, there is an additional 
complication. Note that, in general, the matrix H, will contain 
rows which are composed entirely of zero elements. This is be- 
cause certain states are not affected by certain inputs. We can- 
not solve equation (18) for any n variables in terms of the re- 
maining (kp — n) variables. They must be so selected that the 
portion of the H-matrix associated with the n variables is non- 
singular. 

A straightforward procedure can be used for the selection of a 
set of n variables out of the total kp variables. 


(1) Consider the p vectors, ®~'a,,, ®-'a,, .. . , in 
equation (19). They must be linearly independent. For other- 
wise, this implies that p inputs are dependent and a linear change 
of variables can reduce the number of inputs. This change of 
variables can always be made for equation (18) before this process 
of selection. 

(2) Consider the sequence of vectors, ®~‘*s,,,...,@~*a,; 
®~*a,,,...,D-*a,; D~‘a,,, .. .. one by one in turn and test it for 
linear dependence on the vectors selected in (1). If it is, discard 
it from consideration; if not, include it with the group of vectors 
in (1). 

(3) Repeat this until a total of n-p vectors have been selected 
from the sequence in (2). 

(4) The variables associated with the n column vectors so 
selected are one set of optimum solution variables. 


Remarks. The existence of a set of n independent column vec- 
tors in Equation (19) can be used as a definition for complete 
controllability of a multi-input system, since this guarantees a 
solution to time optimal control without input constraint. The 
foregoing selection procedure, however, is not unique. Because 
when we solve for the n variables in equation (18) in terms of the 
other (kp — n) variables, there is the possibility of setting a few 
of these variables to values other than zero without having to 
increase the number of control steps. From this consideration 
we see that the solution of the discrete time optimal problem is 
generally nonunique in the case of multiple inputs even when no 
constraints on the sizes of inputs are imposed. This is different 
from the single-input case where the nonuniqueness comes as a 
result of the constraint. 

When input constraints are added, the solution-space concept 
described previously still applies. Of course, in this case, the 
hypercube in practice will have unequal sides. Furthermore, one 
is no longer restricted to move in one dimension, but p dimensions 
at a time. 

Time-Varying System. When the F-matrix in equation (1) is 
time-dependent, the corresponding solution of equation (4) is 


x(t) = + ®(i, r)Du dr 
P(t, = 
All the discussion under ‘‘Solution-Space Analysis” applies after 
more or less obvious modifications. The central difficulty in- 


volved here is the calculation of ®(t, 7). Some of these problems 
are treated by Friedland [14]. 


Conclusion 


In this paper, the author has presented the following points: 
1 The general optimal control problem has been formulated 
as a nonlinear programming problem which permits practical 
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solution for certain cases. In particular, this is useful for a sys- 
tem which permits essentially open-loop operation or for a system 
whose disturbances are controlled by auxiliary means such as in 
conditional feedback systems. 


2 The discrete time optimal control program is investigated 
in detail by fairly elementary algebraic and geometrical means 
which reveal some known as well as new properties concerning the 
solution. 

It is to be emphasized again that no practical solution exists 
today for implementing time optimal control in a feedback man- 
ner. Using Kalman’s recursive approach, one needs an n-variable 
function generator, while for the author’s approach, a very high- 
speed computer is necessary to solve (P-3) repeatedly at every 
step. Neither of these is very promising at this state of our 
technology. The development of an approximate optimal control 
policy which may be practical is an interesting area of investiga- 
tion as pointed out by Kalman [15]. Here the main problem is 
the stability of such systems. Lyapunov’s method is of great 
help. However, one is often forced to assume a Lyapunov func- 
tion a priori. Consequently, the result of optimization is de- 
pendent on the assumption. Much more work is needed on this 
difficult problem. 
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APPENDIX 


Discussion of Complete Controllability and Requirement 
det (H) ~ 0 


The concept of complete controllability has been mathemati- 
cally defined by Kalman and in a recent paper [16]. In Section 4 
of the reference, a theorem was stated and proved to the effect 
that 


are linearly independent if and only if the system is completely 
controllable. From this it follows immediately that det(H) = 0 
for a completely controllable system as we claimed in the section, 
“Solution-Space Analysis,” (Note the vector a in this paper dif- 
fers with the vector a in [16] by the matrix factor ®. All other 
notations are the same.) Two further conditions for complete 
controllability were stated by Kalman as: 


1 In the Jordan canonic form of F no two blocks are asso- 
ciated with the same eigenvalue. 
2 For all eigenvalues of F, 


for Re[A,(F) = \,(F)) = 


where q is any integer. This is the Kalman-Bertram criterion. 
For a rigorous proof of 1 and 2, refer to [16]. We shall provide 
here a simple physical interpretation of the conditions for those 
who are interested. 

Condition 1 is stipulated to insure that the n homogeneous 
solutions of equation (1) will be linearly independent. Since 
the elements of matrix H are linearly related to values of these 
n solutions evaluated at n time instants, linear independence is 
necessary to insure that det(H) ~ 0. A drastic example of this 
is the following set of equations: 


= 22 + U, = & 


Here the time optimal control problem has no solution unless 
¢: = ¢. Note that occurrence of the same eigenvalue does not 
necessarily imply the existence of double poles in the system trans- 
fer function. The second condition, the Kalman-Bertram cri- 
terion, is required to insure further that the n homogeneous solu- 
tions when linearly operated upon to produce the matrix H do 
not become dependent. This can happen according to the follow- 
ing process: 

Suppose an impulse is applied at the input u,,. The n impulse 
responses observed at the n outputs are linearly related to the 
n homogeneous solutions of equation (1). Now we consider a 
length of time and divide it into n sampling periods. Each im- 
pulse response is integrated over these n sampling periods to form 
n values in a row. Then rows form a matrix H’ which is related 
to the matrix H by the equation H = ®-"H’. Now we ask 
whether any two rows of H can be made scalar multiples of each 
other because of the process of integration. This can certainly 
be made to happen if the Kalman-Bertram criterion is not satis- 
fied. Since when the integration (sampling) period is made equal 
to multiples of the difference of natural frequencies of the system, 
the contributions to integration by the imaginary part of these 
eigenvalues are lost. Consequently, two eigenvalues with equal 
real parts can give rise to two rows of H which are scalar multiples 
of each other. 
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The Optimum Response of Second- 
Order, Velocity-Controlled Systems 
With Contactor Control 


This paper is concerned with the optimum control problem for plants described by second- 
order differential equations with constant coefficients and with velocity control. Empha- 


sis is placed on the case where the characteristic equation of the system has one zero root 
and two complex conjugate roots. The problem is studied in terms of the motion of the 
phase point in a three-dimensional phase space. An iteration method is developed to ob- 
tain the optimum trajectory, which in turn gives the optimum response. 


Introduction 


PHYSICAL system described by an equation of the 


de 


dt? 


1 
+ 2¢ +e = u(t) =— sgnF (1) 


is said to be a second-order, position-controlled system; and a 
physical system described by an equation 


u(t) = —sgn F 


Fig. 1 Block diagr for tion (2); r—c =e 


q 


is said to be a second-order, velocity-controlled system. The 
quantity e(t) is the error of the system under consideration; ¢ 
is the damping factor; u(t) = —sgn F is the control function, 
which is the output of the contactor shown in Fig. 1; and F is the 
switching function, which is the input of the contactor. The 
purpose of the control function u(t) is to reduce the error and its 
derivatives to zero simultaneously. When the switching func- 
tion F is prescribed as a linear combination of the error and its 
derivatives, both systems given by equations (1) and (2) were 
considered by the first author [1].* It is natural to ask the ques- 
tion: ‘How should the control function u(t) be chosen such that 
an initial error and its derivatives are reduced to zero simul- 


1 This research was supported by the United States Air Force 
through the Air Force Office of Scientific Research of the Air Re- 
search and Development Command, under Contract Number AF49- 
(638)-513. Reproduction in whole or in part is permitted for any 
purpose of the United States Government. 

? Formerly, Graduate Student at Stanford University, Stanford, 
Calif. 

* Numbers in brackets designate References at end of paper. 

Contributed by the Instruments and Regulators Division of Tue 
AMERICAN Society OF MECHANICAL ENGINEERS and presented at 
the Joint Automatic Control Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
June 6, 1960. ASME Paper No. 60—JAC-3. 
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taneously in minimum time?” For the system given by equation 


(1) Bushaw solved this problem in a PhD dissertation, which is 
summarized in [2], using topological arguments which are dif- 
fiult to extend to higher order systems. The similar problem 
for equation (2) was suggested by Bushaw. To our knowledge, 
this problem has not been treated satisfactorily for the case |{| < 
1, which we shall discuss in this paper. 

Both cases mentioned in the foregoing are special cases of the 
famous optimum control problem. A restricted form of this 
problem is stated briefly as follows: 

Consider the motion of the phase point x = (z', x*,..., 2") in 
the n-dimensional vector space X,, in accordance with the vector 
equation 


(3) 


where A is a linear transformation with real coefficients, 6 is a 
constant vector in X,, and u is a scalar function, |u| < 1, and is 
piecewise continuous. It is required to find u satisfying the fore- 
going restrictions such that the phase point x can be steered from 
an initial point x» to the origin O of X,, in accordance with equa- 
tion (3) in minimum time. 

There are many persons who are responsible for the present 
advancement of the optimum control theory. To name a few, 
there are Bushaw, Rose [3], Bellman, Glicksberg, and Gross [4], 
Bass [5], LaSalle [6] in this country, and Pontrjagin [7], Boltjan- 
ski [8], Gamkrelidze [9], and Krasovskii [10] in Soviet Russia. 
The most general theory to date is the maximum principle pre- 
sented by Pontrjagin. However, additional work is required to 
perfect the theory. For the system described by equation (3), 
Gamkrelidze proved rigorously the existence and the uniqueness 
theorems for the solution and he showed that the optimum con- 
trol function for equation (3) must be of the form 


u = sgn[b, t] (4) 


In equation (4) [b, t) is the scalar product of b and the vector 
function The vector function Y = (Wi, Po, ..., is the 


solution of the adjoint equation of equation (3) 


ay 
= 5 
dt 6) 
where A’ is the adjoint of A. 
It is equation (4) together with the uniqueness theorem stated 
in the Appendix that are of interest in the consideration of the 
problem treated in this paper. 
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Optimum Control Function for Equation (2) 

With the help of equation (4) and the uniqueness theorem, we 
can proceed to obtain the expression of the optimum control for 
equation (2). We shall study the problem in phase space by 
writing equation (2) as 

0 1 0 e 0 
-l ei+]0 (6) 
with the new variables e! = e, e? = e’, ande* = e”. The solution 
of equation (2) is now represented by the motion of the point e 
= (e', e*, e*) in a three-dimensional phase space X;. We observe 
that equation (6) is of the same form as equation (3). 
For the case |{| < 1, it is convenient to introduce the transfor- 
mation 
01 
0 (7) 
1 2¢ 1J}Le 
where = (1 — 
(6) becomes 


In terms of the new co-ordinates equation 


v 2! 
—v Of] +] (8) 
0 0 OjLz* 1 
In comparison with equation (3) we have x = (z', z*, 2°), 
b = (f,», 1), and the linear transformation A is represented by 
the coefficient matrix in equation (8). 
During any switching interval, an interval during which u is 
a constant, the solution of equation (8) in component form is 
given by 


= u + me! cos (vt — 
= sin (vt — (9) 
z* = z,' + ut 


where ro, 9, zo* are determined by the initial values of the variables 
in the switching interval under consideration. Upon the elimina- 
tion of ¢ in equation (9) we can visualize the trajectory easily by 
its projection in the z'z*-plane with z* appearing as parameter. 
It can be shown easily that the projection of th trajectory in a 
switching interval in the z'z*-plane is a logarithmic spiral with 
focus at (1, 0) for u = 1 or with the focus at (—1, 0) for u = —1. 
The change in z* between two points on the same spiral as well 
as the angle A@ subtended by these two points measured from the 
focus of the spiral is proportional to t. In fact, we have 
Az? = uAt 
and 


= —vAt 


These are demonstrated in Fig. 2. 
The adjoint equation of equation (8) is 


d 
a Olly 
000 


The solution in component form is 
vi = Cet cos (vt — 5*) 
= sin (vt — 6*) 
= C; 


/ MarcH 196) 


where C,, Cz, and 6* are determined by the unknown initial 
conditions of equation (11). 

It follows from equations (4), (8), and (12) that the optimum 
control for the system under consideration must have the form 
u(t) = sgn[f-Ciel cos (vt — 5*) 

— v-Cye™ sin (vt — + 
= sgn [C*e~** — cos (vt — 


(13) 


where we have introduced 6 = 6* + o with ¢ = cos~' (—f) and 
C* = C,/C, with C, > 0. 

If the constants C* and 6 can be related to the position of the 
initial point xo, then the optimum control problem for equation 
(2) is solved. Thus, the equivalent problem of equation (2) is 
to find the constants C* and 6 such that the trajectory of the 
phase point passes through the initial point x» and the origin 0 of 
X; in accordance with equations (8) and (13). 

Since the number of zeros of u(t) in equation (13) varies with 
the time interval T during which u(t) is valid, the representation of 
C* and 6 in terms of xo is difficult. If we make use of the unique- 
ness theorem, we could study the sets of phase points which can 
be steered to the origin of X; in 0, 1, 2, . . . n, changes of sign of 
equation (13) by substituting ¢ by the reverse time r. Then 

u(r) = sgn[C’e'? — cos (vr + 8’)] (14) 
where |C’| > Oand0 < 8’ < 27. For any given set of C’ and 3’, 
the zeros of (14) are found to be 7, 72, 7s, From these 7,’s 
the switching points (points on the trajectory where u(r) or u(t) 
changes sign; the origin is not considered as a switching point) 
are determined in X;. Upon choosing various values for the pair 
(C’, 6’) through their permissible ranges, we can obtain a cluster 
of switching points in X;. If the cluster is dense enough, they 
represent a reasonable facsimile of the true optimum switching 
surface. More systematically, we could first choose the value of 
C’, then vary the value of 6’ through its range to obtain a 
switching curve which lies on the optimum switching surface. 
Then the value of C’ is varied to get the optimum switching sur- 
face. As we shall soon see, the switching curves are complicated 
enough in structure to indicate the futility of the construction of 
the switching surface. Instead, we shall develop an iteration 
procedure to obtain the optimum trajectory of equation (8). If 
the optimum switching curves for specific values of C’ are avail- 
able, the proposed procedure gives the optimum trajectory. If 
approximations for the optimum switching curves are used then 
the trajectory obtained will also be approximate. 

For simplicity, we shall first consider the case where { = 0 in 
the following section. 


Fig. 2 Projection of trajectories in the x'x*-plane 
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The Optimum Trajectory for ¢ = 0 


Upon setting £ = 0 in equation (9), we observe that the trajec- 
tories in X; are formed by arcs of circular helices with their axes 
at (1, 0) and (—1, 0) alternately, and equation (14) becomes 


u(r) = sgn[C’ — cos(r + 3’)] (15) 


Examination of equation (15) shows that the zeros of u(r) are 
periodically spaced along the r-axis. The parameters C’ and 6’ 
set, respectively, the bias level and the phase of a sinusoid. 

For any given C’, it is not difficult to show that the following 
construction gives the projection of the optimum switching curve 
in the z'z*plane. 

(a) The Construction of Switching Curve for Given C’. Given C’, 
cos~!(C’) has two values in the range from 0 to 27. Let these be 
6,’ and 6,’ ,with 5,’ > 5,’. If we have 6,’ < 6’ < 4,’ in equation 
(15), then cos 6’ < C’ and u(r = 0+) = 1. The quantity 0 
= 6,’ — 6,’ is the maximum switching interval in radians cor- 
responding to u = 1. In other words, the part of a trajectory 
corresponding to u = 1 subtends an angle @ between switching 
points. It follows from the periodic nature of equation (15) that 
the maximum switching interval corresponding to u = —1 is 
6’ = 2r — 6; that is, the part of a trajectory corresponding to 
u = —1 subtends an angle 0’ between switching points. 


Fig. 3 Projection of switching curve in the x'x*plane for C’ = 1/2, 

To construct the part of the switching curve indicated by I’, in 
Fig. 3, we construct first a circular arc of unity radius with its 
center located at S,(1, 0) in the z'z*-plane. The arc starts from 
the origin O in the direction of increasing t and subtends an angle 
O ending at &. This is the first arcon ',. To obtain the second 
arc, we extend the line S,£, to with = then we form 
a circular arc of unity radius with center at S,’ and subtending 
an angle 6’. This arc is tangential to the first arc at £; and ends 
at &. These two arcs form a unit group. The other ares on I’, 
are simply the repetition of these two arcs, and are aligned in 
such a way that all the arcs are tangential to the neighboring 
ones. This completes the construction of T’,. 

For the construction of T'_, we simply replace the primed quan- 
tities in the foregoing description by the unprimed ones and vice 
versa. The entire switching curve for fixed C’, I’, consists of T', 
and T_. The switching curve for C’ = 1/2 is shown in Fig. 3. 

It is interesting to note that the inclinations of Ty and I’. de- 
pend on the magnitudes of @ and @’ which in turn depend on the 
choice of C’. For large values of |C’|, '. and I. will cross over 
each other in the z'z*-plane. If C’ is set equal to zero then 6 = 6’ 
and all S, and S,’ lie on the z'-axis. The switching curve is the 
same as the one obtained by Bushaw. 

(b) The Heration Procedure. It is obvious that employing the 
switching curve constructed in the manner described, an initial 
point x» = (z', z*, z*) can be steered to another point x, = (0, 0, 
z,*) optimally, where 
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n 
z,* = x? + u, Ad, 


(16) 


and u, is the value of the control function during the ith switching 
interval A@,; n is the total number of switching intervals. From 
equation (14), the u,’s alternate between 1 and —1, and the A@,, 
i = 2,3,..., — 1, alternate between 6 and 6’. 

If the parameter C’ is correctly chosen then x, = 0 and the 
trajectory obtained is the optimum solution of equation (8). 
Thus we can form an iteration method for the determination of 
the optimum value of C’ and the corresponding trajectory merely 
by examining the size of z,’. 

We assume C’ is a continuous function of z,* when the initial 
point xq is steered to x, = (0, 0, z,*) by means of the switching 
curve for arbitrary C’.‘ Let the sequences of the trial values of 
C’ and the corresponding z,* be {C,’} and {z,,)*}, respectively. 
Then we can form the difference quotient 


C,' — 


(17) 
— 


where C,’ = C.»' + AC and z},)* are the values of C’ and z,* 
for the nth trial. C,,’ is the optimum value for C’. From a very 
elementary argument, we conclude that: If Q, is positive, sgn 
(AC,) = sgn Zin)? and if Q, is negative, sgn(AC,) = —sgn 
Zin)’. From the definition of AC, it is obvious that the choice of 
the next trial value C,.4;:’ is determined by: 


If AC, >0 choose Cys’ < C,’ 


(18) 
If AC,, <0 choose Can’ > Cc,’ 


This trial procedure is continued until z,* is zero or some satis- 
factorily small value. In addition to equation (18), we may choose 
the successive C,,’ to be the average of two previous trial values 
C,’ and C,' where z,,)* is the minimum of the sequence of z,* 
> 0 in the previous trials, and z,,;)* is the maximum of the se- 
quence of z,* < 0 in the previous trials. The effectiveness of the 
iteration method will be demonstrated later. 


Periodic Approximation for ¢ ~ 0 


When ¢ + 0, the switching curve in X; corresponding to equa- 
tion (14) is difficult to find. We could enlist the help of comput- 


ing devices in obtaining such a switching curve. However, if 
we confine the initial points to a region such that they can be 
steered to the origin of X; in a relatively few number of switch- 
ings, then we may approximate the control function in equation 
(14) by 
u(r) = sgn[C’ — cos(vr + 8’)] (19) 
The construction of switching curves derived from equation 
(19) are relatively easy. In fact, a similar method of construction 
to that outlined in the preceding section applies here. The 
periodic switching curves in the z'z*-plane for 0 < { < 1 and 
—1 < ¢ <0 with C’ = 1/2 are shown in Figs. 4 and 5, respec- 
tively. We notice from Fig. 5 that the projection of the switching 
curve in the z'z*-plane for given C’ is bounded for —1 < £ <0. 
If the controllable region M is defined to be the region within 
which any initial point xo can be steered to 0 by means of an 
admissible control (not necessarily an optimum control), we see 
from Figs. 3 and 4 that M coincides with X;for0 < { <1. Itis 
easy to show that for —1 < ¢ < 0, the projection of M in the z'z*- 


‘Generally speaking this assumption is not true everywhere. For 
instance, if the initial point and the control parameter C’ were so 
chosen that the switching points of the trial trajectory are located 
at one or more of the cusps of the switching curve, then a slight 
change of C’ may result in the sudden change of z;*. However, this 
can be avoided by choosing another value of C’. 
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Fig. 4 Projection of periodic switching curve in the x'x*-plane for C’ 
=1/2,0<¢<1 


Fig. 5 Projection of periodic switching curve in the x'x*plane for C 
= 1/2,-1<¢<0 


plane corresponds to C’ = 0 and it is the same as that of the 
second-order, position-controlled system considered by Bushaw. 
There is no restriction along the x*-direction. 

For points lying in the controllable region M, the iteration pro- 
cedure proposed in the preceding section will yield an approxi- 
mate trajectory passing through x and the origin 0. The ap- 
proximation is good if the approximate trajectory and the op- 
timum trajectory both have the same number of switching points. 


Case of Step Input 


This is often used to study the system performance. For the 
present case step inputs correspond to initial points on the z*- 
axis. We shall consider separately the case for ¢ ~ 0 and the 
case for c = 0. 

(a) If £ # 0, then the optimum trajectory passing through the 
initial point x» = (0, 0, z9*) and the origin 0 correspond to ex- 
actly two switchings of the contactor and u(t = 0+) = —sgn 
z*. That this statement is true can be shown by construction 
and applying the uniqueness theorem [11]. 

(b) If & = 0, the foregoing is no longer true. This is evident 
from equation (15). Further considerations lead to the following 
observations: That u(t = 0+) = — sgn z*; that the trial values 
for C’ must have |C’| > cos~! 30°; that the total number of 
switchings required for the optimum trajectory is even. These 
are demonstrated later by an example. 
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Examples 


To demonstrate the effectiveness of the iteration method, let us 
consider the following examples: The first example is for general 
initial values and the second example is for step input. For 
simplicity, only the case £ = 0 is considered. 

Example 1. The initial values are ¢g = —5.67, eo’ = —3.6, 
éo” = 2.0, which correspond to a point x» = (—3.6, 2.0, —3.67) in 
X;. The iterations are given in Table 1. 


Table 1 
No. of 
Trial AC,  switchings 
1 0.9781 (cos—!12°) (a) 24.15 12 
(b) 22.5 10 
(c) 20.85 10 
(d) 26.8 12 
2 0 (cos! 90°) —3.91 >0 <0 2 
3 0.5 (cos 60°) —2.35 >0 <0 3 
4 0.7314 (cos! 43°) 0.192 >0 <0 4 
5 0.6157 (cos~!52°) —1.745 >0 <0 4 
6 0.682 (cos! 47°) (a) —3.02 
(b) —0.576 >0 <0 4 
(c) —1.85 
7 0.707 (cos-!45°) (a) —1.87 
>0 ind 4 
(b) 0 


Since we are working with the projection of the switching curve 
in the z'z*-plane, there may exist several trial trajectories for 
given C’. These are indicated by (a), (b),...in Table 1. The 
projection of the trajectory in the z'z*-plane for C;’ = 0.707 = 
cos~! 45° and the corresponding error response are shown in 
Figs. 6 and 7. The initial point x» is indicated by Py. The 
switching points are indicated by P;, Ps, .... 

Example 2. The initial values are eg = —6.63, eo’ = 0, eo” = 0, 
which correspond to an initial point xo = (0, 0, —6.63) in Xs. 
The iterations are given in Table 2, in which we have listed only 
the value of the z/ which is nearest to zero for each C’ chosen. 


Table 2 
No. of 
Trial C,’ Zysiny* Qn AC,  switchings 
1 0.9962 (cos~'5°) —0.39 2 4 
2 0.9703 (cos“!14°) +0.35 <0 <0 4 
3 0.9848(cos?10°) —0.19 <0 >0 4 
4 0.9816(cos“!11°) —0.07 <0 >0 4 
5 0.9744(cos“'13°) +0.21 <0 <0 4 
6 0.9781 (cos“?12°) +0.07 <0 <0 4 


Fig. 6 Projection of switching curve and projection of trajectory for 
example 1 with C;’ = 45° 
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In Table 2, if further trials were carried out, evidently the 
value of C’ must be 0.9781 < C,’ < 0.9816. The projection 
of the trajectory in the z'z*-plane for Cy’ = cos~! 11° = 0.9816 
and the corresponding error response are shown in Figs. 8 and 9. 
There is a residual error of e, = —0.07. We see from this ex- 
ample that the iteration method loses much of its accuracy for 
large magnitude of C’. This is a drawback of the present scheme. 


With the aid of the recently developed results in the optimum 
control theory, we have shown an iteration method which will 
yield the solution to the problem for second-order, velocity-con- 
trolled systems. If the switching curves used in the iteration 
are derived from equation (14), the solution will be optimum. 
If the switching curves used in the iteration are periodic approxi- 
mations then the solution will be nearly optimum. The method 
seems to be quite tedious; however, we are not aware of any easier 
method for solving the optimum control problem. By means 
of the present scheme certain features of the second-order, ve- 
locity-controlled systems with complex characteristic roots 
can be exhibited. A more detailed account is given in [11]. 

The realization of the optimum systems may present some en- 


e(t) 


Fig. 7 Error response for example 1 with C;’ = cos! 45° 


(-1.0) 


-2 
Fig. 8 Projection of trial trajectory for example 2 with C,’ = cos~! 11° 
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gineering difficulties. However, if the second-order, position- 
controlled systems considered by Bushaw can be realized at all, 
then the present scheme presents no special difficulty in its reali- 
zation. Needless to say such realizations would always be much 
more complicated in structure than the realization of linear 
switching functions considered in [1]. Such is the price one 
has to pay to obtain the optimum performance. 

Finally it should be mentioned that, for third-order systems 
whose characteristic equation has real roots only, the optimum 
switching criterion can be expressed in a much simpler way as 
function of the phase variables (see for instance [12]). How- 
ever, complex roots occur in many technical problems which can 
be described by a third-order differential equation; this fact in- 
stigated the present investigation. 


APPENDIX 


The existence and uniqueness theorems proved by Gamkrelidze 
for the system described by equation (3) are stated as follows. 

Existence Theorem. If there exists an admissible control which 
will steer the phase point x from position & to position & in ac- 
cordance with equation (3), then there exists an optimum control 
which will steer the phase point from & to & along the optimum 
trajectory in accordance with equation (3). 

If the existence of the optimum control and the optimum tra- 
jectory between the position & and the origin of the phase space 
X,, are assumed and further the set of vectors 


b = A%, A%,..., A"~'b 


are linearly independent in X,, then we have: 

Uniqueness Theorem. If u,(/) and w(t) are both optimum con- 
trols which will steer the phase point x from position &) to the 
origin of X,, O, in time intervals 7; and T; along trajectories 
x:(@) and x(t) in accordance with equation (3), respectively, then 
T, = T; and u(t) = w(t); = x(t) for < t < = 72. 
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Pulse-Width Relay Control 
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in Sampling Systems 


The use of pulse-width control for the on-off regulation of systems subject to sampling is 
investigated in this paper. 


The inherent inability of simple on-off control to achieve 


accurate, or dead-beat control in sampling systems is demonstrated, and methods for 


predicting and minimizing the limit-cycle behavior of these systems are given. 


It is 


then shown that if the on-off control is modified by allowing control of the ‘‘on’’ time 
during each sampling period, it is possible to achieve asymptotically stable dead-beat 


control. 


Sufficient conditions for the asymptotic stability in the large of pulse-width 
control systems are developed, using the “‘second method’ of Lyapunov. 


Practical de- 


sign of pulse-width controllers is developed from the theoretical results and experimental 
evaluation is given for some specific design problems. 


Introduction 


HE general subject of this paper is the problem of 
nonlinear control in sampled-data systems [1-9],* with major 
attention given to the class of pulse-width control signals [1, 6-8]. 
This class of control signals consists of system correction functions 
whose amplitude is restricted to a fixed positive and negative 
value, and whose duration in time is continuously variable over 
the range of the sampling interval. The somewhat more re- 
stricted class considered here assumes control pulses which be- 
gin at the sampling instant and end at any instant within the 
sampling interval. 

The point of view adopted in formulating the problem of non- 
linear control of systems with sampling is as follows: Consider the 
dynamic system to be controlled (called the plant) as actuated by 
a controller whose function it is to select from a restricted class 
of control signals a sequence of signals which will take the plant 
from an arbitrary initial state into (or close to) the desired state 
in a manner prescribed by performance specifications. If the 
controller receives information on the state of the plant and the 
desired state continuously, the system is a continuous-data sys- 
tem; if it receives information only at discrete instants of time 
(assumed to be periodic), the system is a sampled-data system. 

Attention is focused initially on describing in terms of the state 
of the system general control signal selection procedures which 
satisfy the performance specifications. No attention is given to 
the over-all closed-loop structure of the system until specific de- 
sign examples are considered. For this reason, the initial de- 
scriptions may appear as open-loop systems; of course, some 
closed-loop structure is implied since it is assumed that the state 
of the plant is known. 

In this formulation of the problem, the plant is assumed to be 
adequately described by a linear stationary differential equation 
or set of equations, with any essential nonlinearities, such as 
saturation, included in the restrictions imposed on the controller. 
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System Description and Terminology 


The class of systems considered in this paper is described by 
the vector differential equation 


x = Ax + byl!) (1) 


where (.) denotes differentiation with respect to time, x is an 
n-vector with elements, z;, A is an (n X n)-matrix with constant 
elements, a;;, 6 is an n-vector with constant elements, b,;, and u(t) 
is the control input to the plant. 

The plant output, c(t), does not appear explicitly in (1) but is 
implicitly contained as a linear combination of the elements of the 
vector x(t). 

The vector, x, in (1) defines a point in n-dimensional Euclidean 
space, with the elements 7;, . . ., z, forming a basis, or co-ordinate 
system, in this space. Given an initial point, x(f), and the con- 
trol input, (zt), the solution of (1) describes a locus of points in the 
space, called a trajectory. The point x(t) at any time fo, deter- 
mines the behavior of the plant for all time ¢ = 4%, and can be said 
to summarize all the past behavior of the plant necessary to de- 
scribe the future behavior. The vector, x(t), is therefore called 
the state vector or simply the state of the plant at time, t, and the 
space of x is called the state space, X, of the plant. The elements, 
71, . . +» Z,, Of the state vector are designated as the state variables 
of the plant. 

The desired state, x(t), represents in terms of the state space 
chosen for the system, a set of values prescribed by the particular 
control problem for the output and its (n — 1)-time derivatives. 
The control problem assumed through this study is the regulator 
problem, in which the desired output is a constant. This can be 
expressed in terms of the state vector with the proper choice of the 
state variables by the following assumption :* 

Assumption |. The desired state of the plant is fixed for all time 
at the origin of the state space; i.e., x, = 0. 

Control-system design based on this assumption cannot be 
extended in general to control problems where the desired output 
is of the nondeterministic type or is an arbitrary function of 
time. However, it can be extended to systems where the varia- 
tion of the desired output is slow compared with the transient re- 
covery time required in the regulator design. In addition, when 
the desired output can be expressed in the form of a power series 
in ¢, starting at ¢ = 0, the solution can be obtained by modifying 
the solution of the regulator problem [10, 11]. 


‘The assumptions given apply to all subsequent work in the paper 
unless specifically stated otherwise. 
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The equilibrium state, x,, is defined as the state for which x, 
given by (1), is zero for all time. If the system is to be in the equi- 
librium state at the origin (the desired state), a necessary condi- 
tion on the control input is that 


u(t) = f(x,t) = 0, for x = 0, allt (2) 


where f(x, ¢) denotes a linear or nonlinear scalar function of the 
state, x(/), and time, ¢. 

When the control of the plant is subject to sampled data, the 
dynamic behavior of the plant is still governed by (1). However, 
the control input can now be based on the state of the plant only 
at the periodic sampling instants. This restriction is expressed 
in the functional form: 


u(t) = f(a(kT), 0), kT St<(k +1)7, 


@) 


where x(kT7') is the state of the plant at the kth sampling instant, 
and T is the constant sampling period. 

Using (3), the differential equation governing the state of the 
plant during the kth sampling period has the closed-loop form 


a(t) = Ax(t) + bf(x(kT), 0), kT St<(k +17, 


k =0,1,2,... (4) 


Although the final system description must satisfy this general 
closed-loop form, it will be shown that useful design procedures 
ean be obtained by considering first the simpler open-loop form, 
(1), subject to control inputs from a particular class of control 
inputs. 

The general class of inputs which includes the two restricted 
classes to be considered in this work, namely, relay control and 
pulse-width control, consists of all inputs satisfying the follow- 
ing: 

Assumption ll. The control input, u(t), is a piecewise constant 
function of time bounded a priori for all time by 


lu(t)| < M 


where M is finite and constant. 

Therefore, over the interval of time, & < t < 4, for which 
the input is constant at the value po, satisfying Assumption II, 
the solution of (1) can be written,® 


x(t) = G(t — to)x(to) + poh(t — to), ©) 


where G(t) is the fundamental matrix [9, 12] of the differential 
equation (1) and is given by 


G(t) = exp At (6) 
and h(t) is the forcing vector, given by 
t 
hit) = (7) 


Two properties which the fundamental matrix and the forcing 
vector must satisfy are, from (5), 
= 1, h(0) = 0 (8) 


where | is the identity matrix of the same order as G, namely, n. 
Some other very useful properties are [1, 11] 


+ v) = G(u)G(r) (9) 


for any real numbers u, v. Hence, in particular, for u = —2, it 
follows from (8) and (9) that 


‘A more precise mathematical notation would be to define a new 
vector, say s(t; 2z(to)), which is a solution of (4), with fixed y(t), 
starting at state x(to), and evaluated at time t. However, it is felt 
that the more informal usage of x as any state, and x(t) as the solution 
of (4) will not cause any loss of clarity. 
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G-\(v) = G(-v) (10) 


and, more generally, 
= G(—nv) (11) 


for any integer n and real number v. 
Similarly, for the forcing vector a useful property is that [1] 


h(u + v) = G(u)h(v) + h(u) (12) 


for any real numbers u,v. In particular, foru = —v, (8) and (12) 
yield 


h(—v) = —G(—v)h(v) (13) 


The stability of an equilibrium state located at the origin® is 
defined as follows [12, 13]: 

Definition 1. The origin is said to be stable whenever, given 
any € > 0, there exists a 6(€, fo) > 0 such that if, for a trajectory 
x(t) starting at x(t), the condition ||x(t.)|| < 6 is satisfied, then 
||x(t, x(to))|] < €is also satisfied for allt 2 to. 

A stable equilibrium state at the origin is such that some initial 
state yields future states which remain within a bounded region 
about the origin. To describe systems which not only have 
bounded behavior, but also tend to the equilibrium state, the 
following definitions apply: 

Definition 2. The origin is said to be asymptotically stable if it is 
stable and if, in addition, \x(t, x(to))]| —Oast— -. 

Definition 3. The origin is said to be asymptotically stable in the 
large if it is asymptotically stable for all x(t.) in the state space. 

The abbreviation ASIL will be used hereafter to mean ‘‘asymp- 
totically stable in the large’’ or ‘‘asymptotic stability in the large,”’ 
depending upon the context of the sentence in which it appears. 


Continuous and Sampled-Relay Control 


The system description and terminology of the preceding sec- 
tion are illustrated in this section employing a relay-control sys- 
tem. The effect of sampling on the relay-control system serves 
as an introduction to the use of pulse-width control, which is 
discussed in the next section. 

The class of relay-control inputs is described in general by 


p(t) = Msgnf(t), any f(t) (14) 
where, by definition, 
f 4, fit)>0 
sgn f(t) = 0, fi=0 


Equation (14) therefore describes the class of all piecewise con- 
stant functions of time restricted for all time to amplitudes of 
+M, —M, or zero. 

The relationship of the control input to the state of the plant 
is represented by introducing the feedback-control signal 


a(t) = f(x(t), t) (15) 


where f is some scalar function of x(t), and ¢t. The control input 
is then derived from this feedback signal, i.e., 


p(t) = M sgn o(t) (16) 


The vector differential equation for the continuous relay-con- 
trol system is, from (4) and (16) 


x = Ax + 6M sgn o(t) (17) 


The relationship of this system description to the common trans- 


* The definitions apply equally well to any finite equilibrium state 
by employing a linear transformation which shifts the origin to the 
equilibrium state. 
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ler-function description of control systems is illustrated by the 
following example: 

Example 1. Consider the simple d-c motor or servomotor-type 
plant. The differential equation governing the output, e(¢), of 
the plant is 


&t) + et) = p(t) 


where p(t) = M sgn o(t). 
The feedback-control signal is derived from the reference input’ 
and a linear combination of the output and its derivative 


(18) 


a(t) = r(t) — Kye(t) — K2e(t) (19) 


Defining the state variables, 
a(t) = c(t) and z.(t) = et), 


the differential equation of the system of this example (18), can be 
written 


= 
(20) 
= + p(t) 


which is the vector form (1), with 


ak 


In terms of the state variables, the feedback-control signal be- 
comes 


o(t) = r(t) — Kyx(t) — Kerd(t) (21) 


Assuming that r(t) = 0, the equilibrium state is at the origin of 
the state space. A family of trajectories [solutions of (20) for 
various initial states] is shown in Fig. 1. These trajectories, as 
well as others given in this paper, were obtained experimentally 
using an actual relay controller which has an inactive zone as 
shown in the figures; this zone was sufficiently small, however, 
that the response of the system remains essentially that of a 
system with the ideal relay-control input (16). 

Since all trajectories lead to the equilibrium state, x, = 0, re- 
gardless of the initial state, the control policy, defined by the re- 
versal curve shown in Fig. 1, produces a system whose equi- 
librium state is ASIL. Note that trajectories which intersect the 
reversal curve in the region marked A-A’ follow the reversal 
curve into the origin. In this region the slope of the trajectories 
is more negative than the slope of the reversal curve, causing the 
relay to reverse continuously, or ‘‘chatter,’’ until the equilibrium 
state is reached. 

When the relay control is subject to sampled data, the control 
inputs are restricted not only in amplitude but also in time; i.e., 
they are restricted to change value only at the periodic sampling 
instants. Therefore, the control input over the kth sam- 
pling period is specified by 


w(t) = w(k) = M sgn o(k), kT St<(k+ 1)T (22) 


where 
o(k) = f(x(kT)), 


From the time solution, (5), the difference equation for the 
sampled-data, relay-control system is then 
x(k +1) = G(T )x(k) + T) (23) 


where x(k) implies x(kT7’). 


7 By Assumption I, r(f) = 0. Note the difference between the 
reference input, r(¢), and the control input, u(t), as shown by (18) 
and (19). 
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Example 2. To iliustrate the effect of sampling on a relay- 
control system which for continuous control is ASIL, consider the 
system of Example 1, with the addition of a sample-and-hold unit 
which converts the continuous-control signal, a(t), into the piece- 
wise constant-control signal, o(k). The state of the system at 
each sampling instant is given by equation (23) with 


1 e 


| wr) ] (24) 


and with the control input given by 


1 
an -[! 


u(k) = M sgn o(k) = M sgn a’x(k) (25) 


where a’ is the transpose of the feedback-coefficient vector, 


As is apparent from (25), the control input to the plant can 
change only at the sampling instants and not in general at the 
instants at which the trajectories cross the reversal curve. Two 
typical trajectories for this example are shown in Fig. 2, one of 
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Fig. 1 Trajectories for ti elay-control systems, example 1 
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Fig. 2 Typical trajectories for pled-relay trol system, example 2 
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which goes into a limit cycle of period 47 seconds, the other into 
a limit cycle of period 27 sec. The equilibrium state at the origin 
is‘not asymptotically stable since any perturbation of the system 
from the origin tends to a limit cycle of the form indicated in 
Fig. 2. 

More generally, any nth-order relay-control system of the type 
described in this section which has asymptotic stability with 
continuous control ‘will not be asymptotically stable when subject 
to sampling [1]. 

This lack of asymptotically stable control is an inherent result 
of the interaction of on-off control and sampling. Since the con- 
troller is capable of applying only the full positive or negative 
input to the plant, and since the input is based on the state of the 
plant only at sampling instants, the necessary condition (2) for 
the system to be in equilibrium at the origin is satisfied only for 
the particular case in which the state of the plant reaches the 
origin at a time coinciding with a sampling instant. 

With an arbitrary perturbation from equilibrium, the system 
will not, in general, return to equilibrium, but will at best exhibit 
a bounded periodic trajectory, or limit cycle, about the equi- 
librium state. Since small excursions about the equilibrium state 
may be tolerable, it is desirable to obtain an analytical description 
of limit-cycle behavior characteristic to the system in order that it 
may be evaluated and reduced to the minimum possible region 
about the equilibrium state. 

A characteristic class of limit cycles for which the relay-control- 
input sequence can be specified is the class of symmetric limit cycles 
of half-period mT, where m = 1, 2, 3, ..., and T is the sampling 
period. Limit cycles in this class are defined by the following 
conditions: 

(i) The feedback-control signal, o(k) = f(x(k)), defines a 
scalar function on state space such that for all x, 


sgn f(x) = —sgn f(—x) (26) 


Since the input to the plant is u(k) = M sgn o(k), any two states 
symmetric to the origin will generate inputs of the opposite 
polarity. 

(ii) The states in each half period are symmetric with respect 
to the origin (i.e., the equilibrium state). Therefore, for a limit 
cycle of half-period mT, 


x(k +m) = —x(k) (27) 


(iii) The control input to the plant, u(k), changes polarity only 
at the beginning of each half period of the limit cycle. 

Let x(r) be the state of the system at some rth sampling instant 
at which the sign of the feedback-control function changes from 
positive to negative; i.e., 


f(x(r — 1))>0 


(28) 
S(x(r)) < 0 
Then from (23) and condition (iii), 
x(r + m) = G(mT)x(r) — MhimT) (29) 


for some m = 1, 2, 3,... 
If the states x(r) . . . x(r + m) are states of a symmetric limit 
cycle of half-period mT’, then from (27) 


x(r +m) = —x(r) (30) 
To satisfy (29) and (30), x(r) must be given by 
x(r) = M{l + G(mT)| “h(mT) (31) 


where | = identity matrix, m = 1, 2, 3,... 
Therefore, the state x(r) given by (31) is a reversal state on a 
symmetric limit cycle of half-period mT, provided the feedback- 
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control-function condition (28) is satisfied. The design procedure 
for the elimination of limit cycles is then based upon specifying 
a feedback-control signal, o(k) = f(x(k)), such that reversal of 
the control input cannot take place at the reversal state required 
by (31) for each limit cycle; i.e., such that the necessary condi- 
tion (28) is not satisfied. The following example illustrates this 
procedure. 

Example 8. Consider the relay-control system of Example 2. 
The feedback-control function for this example is simply the 
linear combination, o(k) = a’x(k), given in (25). 

Since for this control function, 


a’x = —a'(—x) 


the condition (26) is satisfied. From (24) and (31) one reversal 
state for a symmetric limit cycle of half-period m7’ is computed 
to be 


x(r) = M (32) 


for all m = 1, 2, 3, ... for which 
—1)>0 and a’x(r) <0 (33) 


These limit cycles for m = 1, 2, and 3, are shown in Fig. 3 where 
the relay reversal line, a’x = 0, is such that (33) is satisfied for all 
three of these limit cycles. The effect of the control-function vec- 
tor, a, upon the limit cycles is demonstrated in Fig. 4, where the 
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Fig. 3 Symmetric limit cycles for pled-relay-contro! system con- 
sidered in example 3 


components of a are changed to elimiaate all but the m = 1 limit 
cycle since the necessary condition (33) is now satisfied only for 
m = 1. This minimum period (27') limit cycle cannot be elimi- 
nated by adjustment of the feedback-control parameters (the 
components of a) since with m = 1, x(r — 1) = —x(r), and (33) 
is satisfied regardless of the choice of a. 

If the excursions about the equilibrium state resulting from this 
inherent 27-period limit cycle are within the tolerances of ac- 
curacy of the specific control problem, then simple relay control 
in a sampling system may be adequate, provided the feedback- 
control function which defines the reversal surface in state space 
is chosen to eliminate all higher-period limit cycles. This can 
be accomplished by the method outlined, on the assumption that 
the control function which eliminates the higher-period, sym- 
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metric limit-cycle behavior also eliminates all iimit cycles of 
period greater than 27’. 

This is a reasonable assumption for all plants having eigen- 
values with negative real parts, where the equilibrium states for 
+M-control inputs, given from (1) by 


x= 


are stable nodes or foci [10] and are finite and symmetric. Since 
the limit cycles represent matching segments of trajectories 
toward the two symmetric + M-equilibrium states, it follows that 
the limit cycles, if stable, will be symmetric. 

In the case of plants having eigenvalues at zero, the A-matrix 
of the plant is singular and the +M-equilibrium states toward 
which the trajectories tend are not finite. The limit cycles which 
can exist in this case are not, in general, symmetric limit cycles, 
and the intuitive hypothesis that the choice of a feedback-control 
function which eliminates all higher-order symmetric limit cycles 
will also eliminate all higher-order nonsymmetric limit cycles, has 
not as yet been proved in general and is the subject of present 
research by the author. 

The plant of Example 3 is an example of this case, since it has 
an eigenvalue at zero. Selecting the feedback-control function 
f(x) = a’x such that all higher-order symmetric limit cycles are 
eliminated, as shown in Fig. 4, it was found experimentally that 
all higher-order limit cycles were eliminated. All trajectories 
tend to one of the family of possible 27-limit cycles shown in 
Fig. 5. 
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Fig. 4 Linear feedback function, o’x, adjusted to eliminate higher-order 
limit cycles 


If this inherent limit-cycle behavior is not tolerable in a 
sampling-control system, the relay controller may be replaced by 
some continuous-amplitude controller, such as a power amplifier, 
or some means provided to allow continuous variation of the 
energy supplied to the plant by the relay controller. The latter 
method of control is the one considered in this paper. 


Pulse-Width Control 


A modification of the sampled relay control which is compatible 
for use with practical relay controllers and yet which permits the 
continuous variation of the energy supplied to the plant is pulse- 
width control. With this class of control inputs, the input to the 
plant at the kth-sampling period is 


wk), kT St<kT +7, 
0, 
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u(t) = (34) 


where 
= M, —-M,0 and OS7, 


Although either the start or termination of the input pulse 
could be chosen to vary the pulse duration [7], all of the pulse- 
width control inputs considered in this work will be in the class 
defined by (34); i.e., input pulses starting at the sampling instant 
and terminating 7, sec later. 

When it is desirable to show explicitly the relation of the pulse- 
width control input to the state of the pliant, the sampled-feed- 
back-control function, ¢(k), may be used. The control input for 
the kth sampling period is then written as 


{ M sgn o(k), kT 


(35) 
| 0, 


u(t) = 


|o(k)| 
B 


Tt, = T sat 


where 
+1, s>1 
2, lz] <1 


-1, z<-1 


satz = 


and where o(k) = f(x(k)) denotes a linear or nonlinear scalar 
function of the state. It may, for example, be a number received 
every 7 seconds from a digital computer, with the sign of the 


ANY 


Fig. 5 Trajectories for sampled-relay control of the system considered 
in example 3, with T = 1 sec 


number determining the pulse polarity and the magnitude of 
the number determining the pulse duration. The factor 8 in (35) 
is a positive constant scale factor included for convenience. The 
practical design of the controller used in the experimental ap- 
paratus for this study is given in Appendix 1. 

Writing the solution of the vector differential equation (1) over 
the intervals for which the control input is constant gives, from 
(5) and (34), 


x((k +1)T) = G(T — + +0 
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Combining these and using property (9) of the fundamental 
matrix, the difference equation over the kth sampling period for 
pulse-width control is 


x(k + 1) = G(T )x(k) + w(k)G(T — 7 


Employing properties (9) and (13), the difference equation (36) 
can also be written 


x(k +1) = G(T)[x(k) — —7;)] 


As is apparent from comparing these difference equations for 
pulse-width control with the difference equation (23) for ideal 
relay control, a significant effect of adding pulse-width control 
has been to change the forcing vector from a constant vector, 
h(7'), to a vector which is a function of the feedback-control sig- 
nal, o(k). This effect, while increasing the complexity of the 
analysis, is also the necessary factor for achieving asymptotic 
stability in on-off control systems subject to sampling, since it 
allows the magnitude of the forcing vector to be varied continu- 
ously over a limited range as T varies from 0 to 7’. 

Example 4. The manner in which pulse-width control can be 
used to eliminate the inherent limit-cycle behavior in relay-con- 
trol systems subject to sampling is illustrated for the same system 
considered in the examples of the previous section. Assuming the 
same linear feedback function used to eliminate all but the 27- 
limit cycles, as shown in Figs. 4 and 5, the pulse-width-controlled 
relay input to the plant at the kth sampling instant is given by 
(35) with o(k) = a’x(k). The mapping of this control function 
on the state space of the system is shown in Fig. 6. 

Referring to Fig. 6, the pulse-width control function creates a 
dual-mode operation in which region I corresponds to ordinary 
relay action (r = 7’) and region II represents a region of reduced 
energy input to the plant (rt < 7). From the analysis in Example 
3, the reversal line, a’x = 0, can be adjusted to eliminate all but 
the 27-period limit cycles, Fig. 3-5. The pulse-width control 
region II can then be adjusted by varying the scale factor, 8, to 
eliminate the 2T-limit cycle. Figs. 7, 8, and 9, illustrate the effect 
of increasing the extent of region II on the symmetric 27-limit 
cycle. 

For comparison with the relay-control-system trajectories 
shown in Fig. 5, the effect of the addition of pulse-width control 
is shown in Fig. 10. It is noted that the somewhat intuitive de- 
sign procedure based upon the elimination of symmetric limit 
cycles has resulted in the elimination of all limit-cycle behavior 
and a system response similar to that obtained for the same plant 
with continuous relay control, Fig. 1. 


(36) 


(37) 


M SGN(Q'X) 


x 


B a7 
IN 

oO 

x 

> 
\ 


> 


Fig. 6 Mapping of the pulse-width-control function (35) on the state 
space for example 4 
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Although this design procedure can be applied to higher-order 
systems, the selection of the reversal surface and the extent of 
the pulse-width-control region about this reversal surface in n-di- 
mensional state space would require tedious trial-and-error selec- 
tion and evaluation, rather than the simple graphical evaluation 
possible in two-dimensional state space, as illustrated by the fore- 
going example. 

It is desirable, therefore, to establish for broad classes of nth 
order plants sufficient conditions on the feedback-control func- 
tion, o(k) = f(x(k)), which determines the pulse-width control 
input (35), such that the system is ASIL. 

Denoting the Euclidian norm of any state vector, x, by 
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Fig. 7 Symmetric 2T-period limit cycle unaffected by pulse-width control 
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Fig. 8 Symmetric 2T-period limit cycle eliminated by pulse-width control 
with 8 = 0.6M 
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from Definition 3 in the section, “System Description and Ter- 
minology,’’ the equilibrium state, x, = 0, is ASIL if and only if 


Jim x(0))|| = 0, any 


(38) 

For the pulse-width-controlled, sampled-data system described 
by (36) a sufficient condition which satisfies (38) is 

+ < all x #0, & =0,1,2,... (39) 


i.e., the distance from equilibrium decreases with every successive 
sampling period, regardless of the present state of the system. 
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Fig. 9 Trajectory and waveforms showing effect on the symmetric 2T- 
period limit cycle with the pulse-width scale factor, 8 = M 


Fig. 10 Trajectories for pulse-width control of the system considered in 
example 4, with T = 1 sec, 8 = M 
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However, for some systems this condition may be too restrictive 
in the sense that control functions, satisfying the necessary and 
sufficient conditions (38), may not satisfy (39). The second 
method of Lyapunov [14, 15] relaxes condition (39) by permitting 
the use of other scalar functions on x in addition to the Euclidian 
norm. The method can be applied to determine the stability of 
the origin as follows: 

Let V(x) be a scalar function defined at every vector, x, in 
state space. If V(x) satisfies the conditions: 


(a) V(x) > Oforx #0 

(b) V(x) = Oforx = 0 

(c) V(x) is continuous at x = 0 
(d) as > 


then the equilibrium state, x, = 0, is ASIL if [16] 


A(V) = V(x(k + 1)) — Vix(k)) < 0; allx #0, 
k =0,1,2,... (40) 

This function V(x) resembles an energy function in the state 
space which has an extremum at the origin toward which all 
initial states tend if the origin is ASIL for the control inputs ap- 
plied. If V(x) may be chosen so that V(x) = (constant) repre- 
sents a family of concentric closed surfaces surrounding the origin, 
and for which the constant defining the surface increases as ||x|| 
increases, then this function defines an order on every state, the 
decreasing value of which, for every successive sampling period, 
ensures that the trajectory is tending to the origin. 

The difficulty in the method lies in selecting a particular V(x) 
for which the sufficient condition (40) can be established for all x 
in the state space. The difficulty is further increased if the non- 
linear relationship of the control signal to the state is included in 
the difference equation of the system. As mentioned in the intro- 
duction, the ‘‘open-loop’’ control approach is the more fruitful one 
in this method; i.e., to consider first the behavior of the plant for 
a general class of control signals, and develop from this analysis 
requirements on the control signals which can be related to the 
state of the plant. 

The application of the Lyapunov method is now considered for 
the particular class of systems described by 


x = Ax + bu(t) 


which satisfies the conditions: 


1 The matrix A is similar to a diagonal matrix D; that is, 
there exists a matrix P such that PAP = D 

2 The characteristic roots, A;, of the matrix A are all real 
and negative, and are denoted by 

A, =-—a, @>O0, 

The diagonalizing transformation x = Py is used to map the 
state space, X, onto the state space Y. For all y the pulse-width- 
controlled system becomes, 


yik +1) = E(T)y(k) + — (41) 


where 


= P“"G(T)P = 


—anT 


e 
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Since the equilibrium state y, = 0 is equivalent to x, = 0 in 
the original state space, a sufficient condition for ASIL of this 
class of systems is, for some V(y) satisfying the foregoing condi- 
tions (a) through (d): 


A(V) = Viy(k + 1)) — Viy(k)) < 0, 
all y #0, k=0,1,2... (42) 


Let the scalar function on y be the square of the Euclidean 
norm in Y space*® 


Viy) = y’y 


which satisfies conditions (a) through (d). From (41) and (42) 
the sufficient condition for ASIL can be written in the explicit 
scalar form 


n n exit 1 


t=1 t=1 a 


ar _ 1\2 
+ p? e~2uT <0 (43) 
1 


where y,(k), u(k), and 7, have been abbreviated to y;, u, and 7, 
respectively, for convenience. The derivation of (43) is given in 
Appendix 2. 

The first term in (43) is clearly negative for all y ~ 0, since all 
a; are positive, while the third term is positive, for all u = 0, 
rt #0. The direct solution of (43) for the relationships the input 
parameters uw and rt must have to the state, y, in order to insure 
a negative value of A(V), is not possible because of the existence 
of 7 in the exponents of the second and third terms. However, a 
solution can be obtained if the exponential terms in Tf are ap- 
proximated by replacing them with the linear terms of their series 
expansion; that is, if it is assumed that e*’ ~ 1 + a,7. Then 


=} 1+a -1 
= =fT 
a; a; 


(44) 


The error in the approximation of (44) is less than 5 per cent for 
all cases where a,;r S 0.1, but greater than 70 per cent for ajr = 
1. A discussion of the implication of this error follows the de- 
velopment of the approximate results for this method. 
Applying the approximation (44) to (43), and using the symbol, 


A(V), to denote the approximate function, then 


t=1 i=1 


AV) = 


+ (ur)? (45) 


i=1 
which is a quadratic form in (ur), the “‘area’’ of the input pulse. 
With r = 0, AC V ) is negative for all y # 0. The derivative of 
A V) with respect to 7, 
* This corresponds to defining the scalar function on % space, 
V(x) = x’Qx, where Q = (P-")’(P-") is symmetric positive definite. 
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a- n n 
= V) =2 —2aiT —2a,T 


is negative at r = 0, provided 
n 
M sgn (- 
i=1 
and is zero at a value T = Topt given by 
i=1 


M > 


t=1 


Topt = 


This is indicated as the optimum value of 7 since it gives the 


maximum negative value of A(V) for any y. 
Since the range of 7 is restricted to0 < + S T, the best choice 
of r is 


n 
y, 


t=1 


= T sat- 
MT) 


t=1 


From (46) and (47), the pulse-width control function for stable 
control based on the approximate analysis can be derived from 
the linear feedback-control function 


re Jay VE) 48 
wE(T)u 
e 


o(k) = 


t=1 


where E(7’) is the diagonalized transition matrix, given under 
(41) and w is an n-vector all of whose elements are unity; i.e., 

Since 


wE(T)u = 


is a positive constant, then the amplitude, u(k), of the pulse- 
width-control input, satisfying condition (46) is, in terms of the 
control signal (48) 


u(k) = M sgn o(k) 
and the pulse duration is 
where 8 = MT, the maximum area of input signal. 
In terms of the original state space, X, the relationship of the 


linear feedback function to the state x(k) can be written in the 
form 


= T sat 


o(k) = a'x(k) = a’Py(k) 


From (48) it follows that the feedback-coefficient vector is 
given by 


(P-)’E(2T ju 
)u (50) 


and, as given previously, the pulse-width scale factor is 
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and 

| 

| 


8 = MT 


These results specify a reversal surface, a’x = 0, in the state 
space which determines the polarity of the pulse input to the 
plant, and a bounded region, —8 < a’x < 8, along the reversal 
surface in which the pulse width varies in direct proportion to the 
scalar function, a’x. From (50) it is seen that the extent of this 
region is increased if either the relay output, +M, or the sampling 
period, 7’, is increased. This reduction of energy into the plant 
in a region near the reversal surface serves to minimize the over- 
shoot of the reversal surface which occurs with sampled-data 
relay control and which leads to limit-cycle behavior such as 
illustrated in Figs. 2 and 5. 

While these results were based on the approximation given by 
(44) they should nevertheless be sufficiently accurate for asymp- 
totically stable pulse-width control of plants whose poles are all 
simple, real, and negative. This conclusion is based on the fol- 
lowing points: 


(a) The requirement that the Lyapunov function V(y) de- 
crease with every successive sampling period is sufficient but not 
necessary for stable control. Therefore, an approximate control 
function can be adequate even though it does not guarantee the 
decrease of the function over every sampling period. 

(b) The critical region where accurate control becomes most 
necessary is the region near the origin, and it is in this region 
where the error in the approximation (44) becomes the least 
since, from (47), the pulse width, 7, goes to zero as the state, y, 
approaches the origin. 

(c) The error in the approximation is least for the dominant 
poles of the plant, i.e., the poles near zero, since a@;,7 is then small. 


The application of this method to a system higher than second 
order is illustrated by the following example: 
Example 5. Consider the third-order plant specified by 


a(t) = e(t) 
= + 22 
= 23 
i; = —323 + p(t), w(t) = +M, 0, M = 10 


The inverse of the diagonalizing transformation and the 
diagonal fundamental matrix for this plant are 


eT 0 0 
11 E(T) = | 0 eT 
3 0 0 


The feedback-coefficient vector derived from the Lyapunov 
method is then computed from (50) to be 


2 
+ 
+ 
1 
If the sampling period is 7 = 1 sec, the transpose of this 


vector is 
e’ = [-—1.73, —1.85, —1.0] 
The feedback-control function, o(k), for this example is then 
o(k) = —1.732,(k) — 1.852.(k) — 1.023(k) 


This function defines the reversal surface, ¢ = 0, and the pulse- 
width-control region, —8 < o < 8, where 8 = MT = 10. In 
order to avoid the need for amplification in the feedback ele- 
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ments, we may reduce ¢ and 8 by the same factor without affect- 
ing the reversal surface or the extent of the pulse-width-control 
region. Dividing both functions by 1.85 gives 


o(k) = —0.942,(k) — — 0.542,(k) 
B = 54 


The complete pulse-width-control system based on this de- 
sign method is shown in analyg computer form in Fig. 11, where 
the pulse-width controller (PWC) has the properties (49). 

The reversal surface obtained by this method can be checked 
with the symmetric limit-cycle design method, discussed under 
“Continuous and Sample-Relay Control’’ and the first part of this 
section, to verify that higher-order limit cycles are eliminated. 
For the system of Example 5, the reversal states and prereversal 
states for the 27, 47’, and 67 limit cycles, computed from (29) 
and (31), are given in Table 1, together with the values of the 
feedback-control function (51) associated with each of these 
states. It is apparent from Table 1 that the higher-order limit 
cycles are eliminated since the sign of o is the same for both the 
reversal and prereversal states, which violates the necessary con- 
dition (28) for the existence of these limit cycles. 


(51) 


Table 1 Reversal and prereversal states and value of feedback-contro! 
function (51) associated with these states, for the symmetric limit cycles 
of the system of Example 5 


Period, Prereversal state Reversal state 
m sec x(r — 1) x(r) 
—0.01 0.01 
1 2T —0.79 0.79 
| 3.02 

o(f — 1) = 2.43 o(r) = —2.43 
—0.36 
2 4T 0.67 1.50 
3.00 3.32 

o(f — 1) = —1.95 o(r) = —3.91 
0.55 
3 6T 1.50 1.64 
3.32 3.33 

o(r-1) = —3.80 o(r) = —4.58 


Since the reversal surface given in this design eliminates the 
higher-order limit cycles, it would be a satisfactory design for 
simple relay control provided the inherent (27') limit-cycle ex- 
cursion is tolerable in the particular control problem 

A final comment on the results in Example 5: As shown in 
Table 1, the reversal states of the 27-limit cycle are well within 
the pulse-width-control region, —8 < o < 8, since, for this ex- 
ample 8 = 5.4. In fact, the equilibrium states toward which all 
trajectories tend with control input, M = +10, are themselves 
within the pulse-width-control region. This indicates that the 


cit) 


! 
! 


(t) 


\ T+! see 
| 
| 


Fig. 11 Synthesis of pulse-width-control system in example 5 
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extent of the pulse-width-control region, determined by the 
Lyapunov method as 8 = MT, is overly sufficient for the origin 
to be ASIL. For instance in this example, the limit-cycle be- 
havior of the system would be eliminated even if 8 were reduced 
from 5.4 to 2.5, since the 27 reversal states would still lie within 
the pulse-width-control region. 

The two design methods discussed in this section are well 
suited to be used in conjunction to evaluate a practical pulse- 
width-control design for plants which are adequately described 
as nth order linear stationary systems with distinct negative real 
eigenvalues. 

For plants having an integration, i.e., a zero eigenvalue, the 
methods do not strictly apply since the limit cycles are not neces- 


sarily symmetric, and since the Lyapunov function A(V) in (45) 
would not be negative for all y # 0. However, these methods 
may be used to yield a practical design for such plants, provided 
the design is verified by analog-computer evaluation, as was done 
for the d-c motor-type plant design discussed in Example 4 at the 
beginning of this section, with the results shown in Fig. 10. This 
design was based on the symmetric limit-cycle-evaluation 
method. If the Lyapunov method is used for the system of 
Example 4, the feedback-coefficient vector (50) is computed to be 


1 
a= = T = 1 sec 
—1.0 


and the scale factor is 8 = M. This agrees very well with the 
values chosen in the earlier design and yields a system response 
essentially the same as shown in Fig. 10. 


Conclusions 


The general objective of this paper has been to investigate 
pulse-width control as a method for accurate on-off control of 
sampling systems. 

It has been shown in the section, ‘“Continuous and Sampled- 
Relay Control,’’ that linear compensation methods which are 
sufficient for accurate on-off regulation in continuous control 
systems, are inadequate when the control is subject to sampled 
data. To ensure that the desired equilibrium state of the plant— 
taken as the origin of the state space in this study—is asymptoti- 
cally stable, the controller must be able to vary continuously over 
a limited range the energy supplied to the plant in each sampling 
period. While this could be accomplished by varying the ampli- 
tude of the control input from zero to some maximum amplitude, 
this method of control is not possible with inherently on-off con- 
trol devices such as the relay. Pulse-width control, however, is 
possible with on-off control devices, since the variation of energy 
is accomplished by allowing continuous variation of the duration 
of the full positive or negative correction applied to the plant 
during a sampling period. 

The first design method discussed in this paper is based upon 
the prediction and minimization of the inherent limit-cycle be- 
havior in relay-control systems. It is shown that this limit-cycle 
behavior can be reduced to the minimum 27-period limit cycle by 
linear-feedback compensation. Pulse-width control may then 
be added over a range of state space sufficient to eliminate this 
minimum limit-cycle behavior if it is not tolerable for the particu- 
lar system design. 

The second method applies to the class of nth order plants 
with diagonable matrix A having negative real eigenvalues and 
specifies the pulse-width-control function in terms of a linear 
scalar function of the state of the system. This simple form of 
the control function results from an approximation of a Lyapunov 
function defined on the state space; however, it most closely ap- 
proximates the exact expression in the terms containing the 
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dominant roots of the plant and which represent the long-term 
transient response of the system. The accuracy of the results is 
also greatest in the critical region of control—the region around 
the desired equilibrium state. From these considerations and 
from the experimental tests conducted during this research, we 
may conclude that this method serves as a good design for pulse- 
width control of plants within the class considered, particularly 
when used in conjunction with the first method, as illustrated in 
Example 5. 

While the application of these design methods may be extended 
to include plants having a zero eigenvalue, the resulting design 
should be verified by analog-computer tests. For the d-c or 
servomotor-type plant considered in the two preceding sections, 
the methods yielded the very effective pulse-width control shown 
in Fig. 10. The pulse-width controller designed by the author 
[1] consists of standard analog-computer components and a 
linear sweep generator (see Appendix 1) and therefore can be 
assembled readily for experimental tests. 

In a subsequent paper [17], the problem of accurate control in 
on-off sampling systems is analyzed from the points of view of 
the inherent capability of pulse-width-control inputs to achieve 
optimum time control. This approach yields some important 
practical design techniques which supplement those presented in 
this paper. 


APPENDIX 1 
Description of Experimental Apparatus 


The over-all experimental setup is illustrated in Fig. 12 for the 
d-c motor type plant considered in Examples 1 through 4 of this 
paper. The plant was simulated by standard analog-com- 
puter techniques, the feedback-vector components, A,z, and 
Ker, being derived from the scaled outputs of the respective 
integrator units. The setup could be operated in any one of three 
modes; continuous-relay control, sampled-relay control, and 
sampled-pulse-width-relay control. 


ANALOG 


| —PLANT— 


! 
LINEAR 


Sweee 
GENERATOR 


SAMPLING TRIGGER | 


FOLLOWER AMPL 


INPUT - OUTPUT PHAS 
RECORDER 


Fig. 12 Block diagram of experimental system 


In the continuous-relay-control mode the feedback-control sig- 
nal, o(t), bypassed the sample and hold, and the pulse-width- 
controller units, and drove the relay directly. With sampled- 
relay control, the feedback-control signal was applied to the 
sample and hold unit which sampled the signal periodically by 
means of a six-diode gate, triggered from the free-running phan- 
tastron sweep generator. In order to charge the hold capacitor 
rapidly while the gate was open, a 6V6 cathode-follower power 
stage was used to supply the necessary current. The hold capaci- 
tor was followed by another cathode-follower stage to prevent 
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leakage of charge from the capacitor during sampling periods up 
to several seconds in duration. The sampled and held output, 
o(kT), of this stage then operated the relay. 

The requirement for the sampled pulse-width relay control was 
to operate the relay with a polarity the same as the sampled-feed- 
back-control signal, o(k7'), for a duration proportional to the 
value of o(kT’) up to the maximum duration, 7, = 7; i.e., 

= 


sat 


where 7’ = sampling period and £ is some positive number. The 
circuit which accomplishes this is shown in Fig. 13. The basic 
operation is as follows: In the first summer amplifier, the sampled- 
control signal, o(k7'), is added to the negative-going linear sweep, 
é,, Which starts precisely at each sampling instant since the sweep 
generator also triggers the sampling gate. Whenever this sum is 
negative,® the first diode conducts, activating the relay in one 
direction. In the second summer amplifier, the negative of the 
control signal is added to the sweep and whenever this sum is 
negative the second diode conducts, activating the relay in the 
opposite direction. 


PULSE WIDTH CONTROLLER | RELAY 
1sT IST DIODE 
-o(kT)-@,) 
o(kT) 
SIGNAL 
AMPLIFIER 


SWEEP INPUT 
Fig. 13 Diagram of pulse width controller and relay 


The manner in which these operations provide the desired 
pulse-width-control action is shown in the waveform diagram, 
Fig. 14. Note that the full pulse width, r = 7’, occurs whenever 
the magnitude of the sampled-control signal is equal to or greater 
than the sweep amplitude. Therefore the scale factor, B, re- 
quired in the pulse-width function, is adjusted simply by adjust- 
ing the amplitude of the sweep voltage. It is also apparent from 
Fig. 14 that the saturation limits of either of the summer ampli- 
fiers will have no effect on the desired pulse-width-control func- 
tion; in fact, it is desirable for the summers to have high gain 
(and hence be driven into saturation) in order to achieve rapid 
relay action. 

The signal which operates each coil of the relay is always charac- 
terized by an abrupt leading edge and a linearly decreasing trail- 
ing edge, which gives a “hard” close and “soft’’ release action to 
the relay, a desirable feature for the rocker-arm-type relay used. 
For this single-pole-type relay, separate positive and negative 
power supplies were required (indicated as +M and —M). How- 
ever, with a double rocket-arm-type relay, or two standard 2- 


® Note that the operational summers give an output which is the 
negative of the sum of the input signals. 
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pole relays, a single ungrounded d-c power source could be used 
to provide the driving power for the plant [1]. 


APPENDIX 2 


The steps in the derivation of the explicit scalar form (43) of 
the sufficient condition for ASIL of the equilibrium state y, = 0 
are as follows: 


With V(y) = y’y, the sufficient condition for ASIL is, from (42) 
ACV) = + + 1) — y’(k)y(k) < 0 
Writing the state-transition equation in Y-space in the form, 
wk +1) = E(T)y(k) + 


where, from (41), 


the sufficient condition now becomes 


ACV) = [E(T)y(k) + + 
— y(k)y(k) <0 


For convenience this is abbreviated to 
A(V) = (Ey + 4)(Ey + @) — <0 
Since y’E’q = q’Ey, combining terms gives 
A(V) = y(E’E — Dy + 2q’Ey + <0 
where | is the identity matrix. 
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Fig. 14 Typical waveforms for pulse-width controller 
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Since E is the diagonal matrix given below equation (41) and q 
is the vector written in component form above, the scalar products 
indicated can be written in the summation form 


n n 


i=1 t=1 a; 


n 
1\2 
t= 
where y; is the ith component of the n-vector y. This is the ex- 


plicit scalar form of the sufficient condition for ASIL given in 
(43). 
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Investigation of Periodic Modes 
of Sampled-Data Control Systems 
Containing a Saturating Element 


A discrete sequence approach is presented in this paper for investigating the possible 
modes of sampled-data control systems containing a saturating element. 


This method 


enables the investigator (a) to determine whether or not a control system has certain 
periodic modes, and (b) if it does, to identify them exactly. This approach is applicable 
to a sampled-data control system of any order. It is extended to the cases where the form 
of feedback is other than unity. 


control systems with saturating ele- 
ments have been studied by Kalman [1],! Nease [2], Mullin [3], 
Desoer [4], and Torng [5]. 

A direct and concise means of studying the stability of a 
sampled-data control system with a saturating element is still 
not available. In such a system, instability may bring about 
periodic oscillations, or nonperiodic oscillations, or both. A pe- 
riodic oscillation here means that the period of this oscillation is 
related to the sampling period by a rational number. 

In this paper, the discrete sequence method [6], [7] developed 
by the authors to study relay sampled-data control systems is 
modified and extended to determine periodic modes of sampled- 
data control systems containing saturating elements. 

Izawa and Weaver [8-10] developed an approach for revealing 
possible modes of a relay sampled-data control system. This 
amounts to an estimation of possible periods of the cyclic re- 
sponse. By using the method presented in this paper the in- 
vestigator can be sure whether or not the system under investiga- 
tion has such possible periodic modes. If it does, one will be able 
to identify them exactly. The computation involved is very sim- 
ple and can be carried out easily. 

It is believed that this represents one step forward in studying 
the stability of sampled-data control systems containing saturat- 


1 Numbers in brackets designate References at end of paper. 

Contributed by the Instruments and Regulators Division of Tue 
AMERICAN Society OF MECHANICAL ENGINEERS and presented at 
the Joint Automatic Control Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
June 6, 1960. ASME Paper No. 60—JAC-9. 


ing elements. The problem of possible nonperiodic oscillations 
however, remains unsolved. 


Description of System 


The sampled-data system studied in this paper is shown in Fig. 
1. The sampling rate is uniform. 

A conventional piecewise linear representation is used to desig- 
nate the saturating element. Its characteristic is shown in Fig. 2. 
The linear region can be replaced by a curve, which faithfully 
describes the situation. 

When the linear plant of the system as shown in Fig. 1 can be 
represented by a transfer function G(s), which has p-poles, a 
difference equation of order p is available to relate the outputs of 
the system and of the saturating element as follows: 


+ +... + a,c, = 
+ t+... +6,f, (1) 


SATURATING 
ELEMENT 


output of system at nth sampling 
instant 

output of saturating element at 
nth sampling instant 


constants defined in equation (1) 


sampling period 

a positive integer 

constants defined in equations (3) 
and (4) 

constants defined in equations 
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a,, 8B, (n =0,1...4 +1) 


v,,9,(r = 0,1...k +1) 


(5) and (6) 

constants defined in equations (7) 
and (8) 

variable between —1 and 1, in- 
clusive 

input to system at nth sampling 
instant 

transfer function of linear plant 
in forward loop 

transfer function of element in 
feedback link 


9h, (r = 0,1...4 +1) 
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Fig. 2 Characteristic of saturating element 


The Discrete Sequence Method 


When a periodic mode of period NT exists in a sampled-data 
control system containing a saturating element, the output of the 
system at sampling instants will repeat N-values successively. 

As the periodic output is fed back and sampled, the output of 
the zero-order hold at the sampling instants will also repeat a 
set of N-values. Such a set of numbers is defined as a sequence. 
This sequence when passed through the saturating element is 
quantized into a periodic one which consists of terms whose mag- 
nitudes can only be +1, —1 or values between them. 

In investigating the oscillations that might occur in a conven- 
tional feedback-control system, the possible periodic output of 
the system can be expressed as a Fourier series with an infinite 
number of terms. This often makes the complete evaluation im- 
possible. 

In the present investigation, a repetitive discrete sequence, 
representing the output of the system at sampling instants is to 
be determined. One question which will be raised is whether or 
not an expression with a finite number of terms can be found to 
represent the discrete sequence. 

It has been established that this can be done in terms of or- 
thogonal functions in the discrete sense. A set of functions S,(7), 
S.(i), . . . Sy(i) with integral argument 7 defined when 


is said to be orthogonal in the discrete sense if 


S(i)S,(i)=0 lem 


A periodic sequence P(i) of period N can be written in one and 
only one way as a linear function of S,, S:...Sy with constant 
coefficients [11]. 

Trigonometric functions with integral argument form an or- 
thogonal set. In terms of this set of orthogonal functions, the 
periodic sequence can be expressed as follows: 


K 
2r 
P(i) = (2, + sin (3) 
n=0 


if = 2K +1 
K 
P(i) = (a conn + 6, sin + cos 
4 N N 


ifN = 2K +2 


where 
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N-1 
2 
8, = P(i) sin n (4) 


N-1 
1 
= WV P(i) cos wi 


Determination Process 


If an oscillation of period NT prevails in a saturating sampled- 
data feedback control system, the output of the system will repeat 
a set of N-values: 


c, = cos r + q, sin r + vencosmn (5) 


r=0 


where N = 2k + 2, and 


3 2mn 
Cc, = E cos r + q, sinr (6) 
where N = 2k +1. 
Then 
cos +h, sinr | + gui cos mn (7) 


where N = 2k + 2, and 


K 


. 
f, = cos r +h, sinr | (8) 


r=0 


where N = 2k +1. 

In investigating the possible periodic modes, the value of N 
is first assigned. One question that will be raised here is that, 
since N is arbitrarily assigned, the task of identifying the periodic 
modes will be endless. The fact is that the possible values of N for 
a sampled-data system can be determined by Izawa’s study [8, 9, 
10]. The relationship between the oscillations of a relay sampled- 
data control system and that of a sampled-data control system 
containing a saturating element is established by one of the 
authors of this paper. 

Once the value of N is determined, the periodic output sequence 
of the saturating element can be postulated. It seems that even 
with the values of N specified, there are numerous possibilities 
for such a postulation. As shown in reference [6], the possible 
cases to be investigated is quite limited. Of course this state- 
ment has to be modified when N becomes large, but it also should 
be pointed out that, with a large N, the step-by-step computation 
presents an even more unpleasant picture. The presence of a 
linear region does not complicate the issue. The considerations 
in determining the possible sequences are presented in the example 
to follow. 

The postulated sequence, by means of equations (3) and (4), 
will determine the coefficients go, gi, . Qe+1, Ai, he, Ay in 
equations (7) and (8). It is assumed that, whenever N is odd, 
and are zero. 

Substituting equations (5) and (7) or (6) and (8) into equation 
(1), and collecting terms, one obtains 


Pp 


2ar 
2 
| 1+ hy sin (9) 
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-€, 
e n=0,1, K+1 
° 

N-1 

a. = coon 2, Pi 


r=0,1,...k+1 
p 


ay-1 | g, cos —— | — v, sin —— 
= h, cos l — 9g, sin (10) 


r=1,2,...k 


It should be pointed out here that these N-equations are linear. 
Furthermore, for each r, v, and g, are related in two and only two 
equations. They can be easily solved. The determination of the 
N-unknowns enables the periodic output sequence to be evaluated 
from equation (5) or (6). The postulated output of the system at 
the sampling instants will be c:, c2, ...c,. This is done by assign- 
ing values of n in equations (5) or (6) to be 1, 2, .. . N, respec- 
tively. 

The postulated sequence ¢;, c2, . . . c, has to be checked for its 
validity. This is done by feeding c;, co, . . . c,, successively back to 
the input. This feedback will produce a saturating element out- 
put sequence. If the sequence so produced is the same as that 
by which the coefficients in equation (7) or equation (8) are de- 
termined, then there is certainly a periodic oscillation which is de- 
fined by ci, C2, Ca, Cy» 

The fact that the output of the saturating element may be 1, —1 
or any value between these two values seems to complicate the 
process of formulating the postulated sequence and of verifying 
its validity. Instead of assigning definite values to the sequence, 
one can designate variables K,(s = 0,1. . ) to alleviate the dif- 
ficulty. When conditions are formulated to guarantee the valid- 
ity of the postulated sequence, the variable K will be evaluated 
by considering the characteristic of the saturating element. This 
is a very significant departure from the method used for relay 
sampled-data control systems. 


Illustrative Example 
The system in Fig. 1 will be studied, with 
G(s) = ——., T = 1 sec 

(s) s(s + 1) 
and r, = 0. The characteristic of the saturating element, shown 
in Fig. 2, is 

f =fle) =1 e>0.2 
= 5e -—0.2 <e < 0.2 
= e< —0.2 


The difference equation relating the output of the system and 
the output of the saturating element at sampling instants can be 
found by direct integration [2] or from z-transform tables by 
taking the z-transform of the transfer function of the linear 
plant. 

The difference equation for the system under study is found to 
be 


Catt — 1.368¢n41 + 0.368, = 0.368f.41 + 0.264f, (11) 


As pointed out before, the number of possible periodic modes 
is limited. In the present study, only three possible values [10] 
2, 4, and 6, can be taken by N. (The period is actually N7, but 
since 7’, the sampling period, is constant, only N is mentioned.) 

Let N = 4 and determine whether or not this system has any 
oscillation of period 47’. 

The first task is to determine the possible postulated output 
sequences for the saturating element. It seems at first sight that 
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there are many possible sequences to be considered. Actually 
this is not true. One can consider the fact that the saturating 
element cannot operate throughout the oscillation entirely in the 
linear region. (If it does, then it can be revealed by general 
techniques for linear sampled-data control systems): Another 
consideration is that the sum of the terms in the sequence has to 
be zero. This can be explained by letting r = 0 in equation (9), 


then 
P 
» Gy-1 = Jo 
=0 =0 


In this problem 


ll 
= 


p 
> @yp-1 = 1 — 1.368 + 0.368 
l=0 


p 
»» b,-1 = 0.368 + 0.264 = 0.632 
=0 


The oscillation can only occur when go = 0. This means that the 
sum of the terms in the sequence has to be zero. It can be con- 
cluded that the possible sequences can only be 


(a) 1,k, —k, —1 
(12) 
(b) i, k, —k 
where 


-1<k<1 


By the foregoing reasoning, it is shown that the task of finding 
the possible sequences is not difficult. Furthermore, the fact that 


= 0 
=0 


is not a happy incident. This is due to the presence of an integra- 
tor in the linear plant. This is a rather common situation. 
Sequence (a) of equation (12) is first studied. From equation 


(4) 
3 
go = > Ki) =0 
1=0 
n = i= 1/(1 + k) 
1=0 
2r 
hy = sin = /(1 + k) 
Let 
J, = '/X1 + k) cos on + (1 — k) cos mn + '/2(1 + k) sin = n 


(13) 


C, = + cos n+ cos mn + sin 
Four equations can be obtained. From equation (9), let r = 0, 
1, and 2, respectively: 
vol — 1.368 + 0.368) = 0 
—0.632v, — 1.368q¢,; = 0.316(1 + k) 


(14) 
(15) 
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2.73602 = —0.104(1 — k) (16) 
From equation (10), set r = 1 
1.368v, — = —0.052(1 + k) (17) 
Solving equation (16) for vs, gives 
t = —0.038(1 — k) 


Solving equations (15) and (17) for » and q (it should be 
pointed out that v, and gq, are related in two and only two equa- 
tions) gives 


v, = —0.119(1 + k) 
q = —0.176(1 + &) 
From equation (14) 
vo is arbitrary 


Equation (13) can be written 
= 0.1101 + &) 
— 0.0381 — k) cos en — 0.176(1 + k) sin ~ n (18) 


Let n in equation (18) be 0, 1, 2, 3, and one obtains the output 
sequence of the system, as follows: 


Co = vo — 0.081k — 0.157 
= — 0.214k — 0.138 
C2 = vm + 0.157k + 0.081 
Cy = v + 0.138k + 0.214 


(19) 


The next step is to check the validity of this sequence. It is 
done by feeding co, ¢:, c2, and c; back through the feedback loop 
and examining whether or not the postulated saturating element 
output sequence can be produced; that is, 


—C = 0.157 + 0.081k — m > 0.2; (20) 


only when this condition is met, can +1 be the output of the 
saturating element. 


—Cs = —m — 0.138k — 0.214 < 0.2; (21) 


only when this condition is met, can —1 be the output of the 
saturating element. 
If the saturating element operates, during part of each cycle, 


in the linear region, then the following two conditions must be 
met: 


—5e, = —i(ve — 0.214k — 0.138) = k (22) 
—5e2 = —5(v + 0.157k + 0.081) = —k (23) 


This is a significant departure from that used for relay sampled- 
data control systems. 

Solving equations (22) and (23) for k and w, it is found that 
k = 7.55. Since k has to be equal to or less than 1, it is not 
possible for such an oscillation to occur. 

To complete the presentation of the procedure, suppose k is 
found to be in the range allowed. Then » can be evaluated from 
either equation (22) or equation (23). 

With k and » evaluated, then conditions expressed by equa- 
tions (20) and (21) can be checked. 

When all the conditions are met, then the oscillation can be 
exactly determined by equation (19). 

If, in any of the foregoing steps, there is a violation of any con- 
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dition stated, it can be concluded that this system will not have 


such an oscillation involving a linear region of the saturating 
element. 


In the present study, one can say that there is no oscillation of 
period 4 involving the linear region of the saturating element. 
It still has to be investigated whether or not an oscillation of 


period 4 in the saturation region alone can exist. To achieve this, 
let 


(a) k=1, 


then the postulated sequence is 1, 1, —1, —1. 
The output sequence of the system at sampling instants will be 


vo — 0.238 
v — 0.352 
v + 0.238 
wm + 0.352 


So long as v < 0.038, the operation of the saturating element 
will be in the saturating region and an oscillation specified by 
equation (24) can exist. 

With the following set of initial conditions: 


(24) 


de(t) 
e(t)| mo = 1, 0, r, = 0, 
0.8 
| | 
4 — 
H | 
0.6}--—---y¥ + 


Fig. 3 Response of system studied in example 


the response of the system is calculated and plotted in Fig. 3. A 
steady-state oscillation is established, which proves the validity 
of equation (24). 


(b) k= —1, 
then the postulated sequence is 1, —1, 1, —1. 
The output sequence of the system at sampling instants will be 
— 0.076 
vo + 0.076 
vm — 0.076 
vo + 0.076 


No value of v can be found to establish its validity, and it can 
be concluded that there cannot be such an oscillation in the 
system. 

The postulated sequence 1, k, —1, —k, can be investigated in 
the same fashion. Although this example is for a second-order 
plant, this method is applicable to plants of any order, and is no 
more difficult. 


Discussion 


All previous discussion presumes that the feedback is unity. 


(25) 
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| | 


If a form of feedback with a transfer function H(s) exists, it can 
be combined with the linear plant transfer function G(s) to get 
G(s)H(s). A difference equation can then be derived. 

If the transfer function of the linear plant or the feedback con- 
tains a factor 1/s then 


> ap-l 
= 


in equation (9) would be zero. An appropriate constant or ramp 
function can be added to the right-hand side of equation (5) or 
equation (6) to take care of possible step or ramp functions at the 
input. Sinusoidal functions whose frequencies are multiples or 
fractional multiples of the sampling frequency can be treated in 
the same fashion. 


A systematic and conclusive method is presented to reveal all 
possible periodic modes of a sampled-data control system contain- 
ing a saturating element in the forward loop. There is no restric- 
tion on the order of the transfer functions of the linear plant or 
the feedback element. 

Used with Izawa’s approach [8, 9, 10], this method yields a 
very satisfactory means of identifying all periodic modes in the 
system. 

It is also the first step in using orthogonal functions with 
integral arguments for studyirg saturating, discrete control 
systems. 
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DISCUSSION 
E. Jury? 

The authors are to be commended for this and their other en- 
lightening contributions to the theory of nonlinear sampled- 
data systems. 

In this brief discussion, the discusser would like to point out 


2 Associate Professor, Department of Electrical Engineering, 
University of California, Berkeley, Calif. 
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that the study of pulse-width modulated systems [12, 13] re- 
duces under certain conditions [13] to the study of sampled-data 
systems with saturations. Such pulse-width modulated sampled- 
data systems are under investigation recently where necessary 
conditions on their asymptotic stability in the large, through the 
use of Lyapunov second method, have been established [13]. 
The authors’ investigations of periodic oscillations could be easily 
applied to such pulse-width modulated feedback systems. 

An alternate approach to obtain these periodic modes is based 
on the recognition of the situation that saturating sampled-data 
systems having sustained oscillation reduce under certain condi- 
tions to sampled-data systems with periodically varying sampling 
rate and pulse width. A study of such systems has been recently 
proposed [14] and further extension of this method to obtain the 
periodic modes in P.W.M. systems has been attempted and was 
briefly added to the cited reference. 

It might be added in passing that the periodic study of relay 
sampled-data systems could also be reduced to the study of the 
foregoing case or to cyclic-rate-sampled-data systems which the 
latter have been studied by Hufnagel [15] and others [16, 17]. 

Conditions for eliminating these periodic modes, if they exist, 
would aid in the design of saturating sampled-data feedback 
systems. In this connection a very recent paper has been pre- 
sented [18] which specifically deals with this situation and might 
complement the interesting contents of this paper. 

As a final remark it might be mentioned that the case of a pure 
delay in the plant transfer function could be easily tackled with 
minor modifications of the difference equations using the authors’ 
method. 

In conclusion, the authors have extended and improved on the 
available techniques for the study of periodic modes in nonlinear 
sampled-data systems and the application of the results obtained 
in this paper can be extended to pulse-width modulated systems 
or equivalently to feedback systems employing magnetic ampli- 
fiers or other nonlinear triggering device as part of the loop. 
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The authors thank Professor Jury for his gracious comments 
on our paper. It is certainly helpful to point out the correlations 
between pulse-width modulated systems and sampled-data 
systems with saturation. 

We also appreciate Professor Jury’s remarks on extending the 
approach we presented to other situations. 

The study of nonlinear sampled-data systems is a challenging 
task. We are happy to see that it is being studied vigorously. 


* Numbers in brackets from [12-18] designate References at end 
of discussion. 
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A major difficulty in the way of a successful systematic approach to the study of control 


processes by way of the theory of dynamic programming is the occurrence of processes 
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having state vectors of high dimension. However difficult the problem is for systems 
ruled by a finite set of differential equations, it is several orders of magnitude more 
complex for systems of infinite dimensionality and for systems with time lags. By 
combining a technique presented earlier for dealing with finite dimensional systems 


and various methods of successive approximations and quasi-linearizsation, certain 
classes of control processes associated with infinite dimensional systems can be treated. 
The ideas are illustrated by discussing control of a system involving a time lag and con- 


Introduction 


he major difficulty in the way of a successful sys- 
tematic approach to the study of control processes by way of the 
theory of dynamic programming is the dimensionality of the state 
vectors. However complex this problem is in dealing with finite 
dimensional systems ruled by a finite set of differential equations, 
the problem is several orders of magnitude removed from solu- 
tions in the case where infinite dimensional systems are being 
treated, or where we have finite dimensional systems with time 
lags. 

Combining a technique first presented in [4]! for dealing with 
finite dimensional systems of large dimension with various 
methods of successive approximations and quasi-linearization, cf. 
{8], [3], certain classes of control processes associated with in- 
finite dimensional systems can be treated. In what follows, we 
shall illustrate this by means of a problem pertaining to the simple 
differential-difference equation 


=cu(t—1) + g(t), t>1, (1) 


and the heat equation 
u,=u,, + g(z,t), t>0, O<z<1. (2) 


A Control Process With Time Lags 


Let us consider a process described by the scalar differential- 
difference equation 


d 
= = 1) +9, t>1, (3) 


with u(t) prescribed over the initial interval, 0 < t < 1, by the 
condition 


ut) =h®, OSt<sl (4) 


Let us suppose that we are interested in terminal control and 
wish to chooose g(t), subject to constraints such as 


1 Numbers in brackets designate References at end of paper. 

Contributed by the Instruments and Regulators Division of Tue 
AMERICAN Socirety oF MecuanicaL ENGINEERS and presented at 
the Joint Automatic Control] Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
June 6, 1960. ASME Paper No. 60—JAC-6. 
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trol of a thermal system. 


effect due to the initial state from the effect due to control. 


lg@®| < ¢>1, 


T 
< ke (5b) 


so as to minimize a preassigned function @(u(7')), where T is a 
fixed time. 

If we follow the usual pattern, cf. [6], [5], we consider the new 
functional 


T 
d(u(T)) +2 f° ora, 6) 
and introduce the functional, f(h(t); T), defined by the relation 
T 
7) = min +> 


where g(t) is subject to (5a). 

Although a certain amount can be done analytically starting 
from this formulation and applying the principle of optimality, 
in general, this approach is stymied from the start by the com- 
plete impossibility of treating the functional f(h(t); 7’) computa- 
tionally. 

Instead of this approach, we wish to present a method which re- 
duces the computational solution to that of the determination 
of a sequence of functions of one variable—the ideal situation. 


~— on Linear Differential-Difference Equations 
[2] 


We require the following facts concerning the solution of the 
equation of (3) subject to the initial condition of (4). 
Lemma. The solution of (3) with (4) is given by 


u(t) = h(1I)K(t — 1) + ef, K(t — th — Mh(h)dt, 


t 
+ fi Ke @) 
where the kernel K(t) is determined as follows: 
=cK(t-—1), t>1, 
K() =1, 0<t<1, (9) 
= 0, t<0 


The linearity of the equation permits a simple separation of the 
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4 
| 
(5a) 
j 


Alternative Functional Equation Approach 


Applying the preceding lemma, we see that 


1 
O(u(T)) = — 1) + ef, K(T — t — 1)h(t)dt; 
T 
+ K(T — 


=¢ + K(T - 


where b has the following simple interpretation: 


(10) 


b = the state of the system at time T if no control is (11) 
exerted fort > 1; i.e., g(t) =0, t>1 


It follows that we may write 


+X f, | = f(b, T) (12) 


The principle of optimality now yields the equation 


1+A 
f(b, T) = min [a f g*(t,)dt, 
1+4] 1 


+5 (0 + K(T — tg(tdh, T a) |, (13) 
or, to terms in 0(A), 
f(o, T) = + fib + K(T — 1)g(1)A, T — A)) 
(14) 
Alen, we have the initial condition 
f(b, 0) = (6) (15) 


This completes our reduction of the variational problem to that 
of determining a sequence of functions of one variable [1]. 


Variable Coefficients 


The case where the equation describing the system has variable 
coefficients, say 


+ a(u(t — 1) + a(tju(t) = g(d, (16) 


requires a different treatment based upon the adjoint equation, 
reference [9]. It leads, however, to the same end result as that 
given in the foregoing. 
This result is important since successive approximations ap- 
plied to an equation such as 
u'(t) = ult — 1), 9), (17) 


leads to linear equations with variable coefficients, ef. [3], [4]. 


Terminal Control at Several Points 
Consider the case where it is desired to minimize the function 
o(u(T), u(T2), .. ., u(T,)) (18) 


An analysis similar to that just given shows that this problem 

leads to a function f(b, be, . . ., b,, 7’) satisfying the functional 

equation 

min + f(b, + — 


+ K(T, — 1)g(1)A, T — A)) 
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(19) 


Analytic Solution 


If B(u(7;),..., u(7',)) is a quadratic function of its arguments, 
the variational problem can be solved analytically in a number of 
ways. Which is superior to the other depends upon the problem 
at hand; cf. Kramer [7]. 


A Thermal Control Process 


Let us consider a rod of unit length the ends of which are in con- 
tact with reservoirs at zero degrees. Initially, a certain tempera- 
ture distribution prevails throughout the rod. By using heat 
sources and sinks distributed along the rod, we wish to minimize 
the deviation between the temperature at some particular point 
on the rod at the termination of the process, and a prescribed 
temperature. 

If we let u(z, t) be the temperature in the rod at z at time ¢, 
then, with proper normalization, the problem may be cast in the 
following form. We are given the equations 


92,9, O0<t<T, (20) 


(21) 
(22) 


u(0,t) = =0, O<t<T, 


u(x, 0) = f(z), O<zr<1 


The function g(z, t) is subject to the restrictions 


a<g(z,) O<zr<1, O<t<T, (23) 


and is to be chosen so as to minimize J, 


J{g} = T) — a;| (24) 

It is most natural, of course, to consider the state of the rod at 
any time to be its temperature distribution. For our purposes, 
though, it is much more useful to introduce the single-state varia- 
ble 


b = the temperature which will obtain at point z, at the | 
termination of the process in the event no con- 
trol is used from the present until termination 
of the process, i.e., g(z,t) = 0, t>0 


(25) 


In addition, we introduce the function C(b, 1), defined by the 
statement 


C(b, t) = the absolute value of the deviation between | 
the prescribed temperature a; and actual 
temperature which materializes at the 
fixed point z, at the termination of a con- 
trol process of duration t, the system being 
initially in state b, and an optimal control 
policy being used 


We shall also make use of the fact that if u(z, ¢) satisfies equa- 
tions (20), (21), and (22) with 


(27) 


then 
1 t 
ule, = dy f, K(x, y; t —s)g(y, (28) 


where 


K(x, t— 8) (: ~ 6 


and the theta function is defined by the series 


(29) 
A(v, t) =1+4+2 cos 
=1 
This leads us to the functional equation 
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C(b, t + A) 
= min (3 +A z; t)v(z)dz, t) + 


(31) 
which in the limit becomes 


1 


Since the kernel K is known to be nonnegative,? the choice of v 
is governed by the conditions 


v(x) = a ‘if 


v(x) = if 


If 0C/db = O, then the nature of v(x) is not obvious.* 
The initial condition is that 


C(b, 0) = |b — a;| (34) 


. Once again we have reduced the original problem to the de- 
termination of a sequence of functions of one variable. 


DISCUSSION 
S. L. Chang* 


The authors’ claim to a method for reduction of dimensionality 
in dynamic programming would be better substantiated if the 
special class of problems considered in the paper were not solvable 
- by elementary variational methods. The problem studied is that 
of terminal control of a linear system. Due to the linearity, the 
terminal variable u(7') can be expressed as a constant represent- 
ing the initial conditions plus a linear integral of the choice varia- 
ble g (t). The separability of u(7’) into these two terms or rather 
the linear dependence on g(¢) is sufficient to insure an easy solu- 
tion by classical variational methods. 

Consider the problem of the thermal control process. From 
(28), it is obvious that to give the upper limit U of u(z;, 7’) 


gy, 8) =a, if y; T—s)>0 


giy,s) =a, if K(m,y; T—s) <0 


and vice versa for the lower limit L. If a; is outside the range be- 
tween U and L, the nearer limit is optimum. If a; is inside the 
range between U and L, there are infinite numbers of solutions 
which give u(z, T) = a; 

The value of-the paper will be much enhanced if the authors will 
give a single example of reduction of dimensionality in which the 
use of dynaniic programming is plausible. 


? References [3], pp. 556-559. 
* Reference [10], chapter 2. 
* Electrical Engineering Department, 


New York University, 
New York, N. Y. 
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There are, of course, many extensions which will be discussed 
elsewhere. 
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Authors’ Closure 


As Dr. Chang points out, the particular example of a thermal 
control process used in our paper is soluble in simple analytic 
terms. This is due to the fact that we wished to simplify the 
analytic details in the course of illustrating the basic ideas of 
reduction of dimensionality. Minor changes in the problem 
will render an explicit analytic solution impossible without chang- 
ing the applicability of the functional equation technique. 

Secondly, let us point out that in the first problem considered, 
involving time-lags and differential-difference equations, there 
is a reduction in dimensionality from a problem involving func- 
tions of functions to a problem involving a sequence of functions 
of one variable—the functions {f(b,7)}. This is what is meant 
by “reduction of dimensionality.”” Most likely, Dr. Chang had 
some other definition of this term in mind when making his 
remarks. 

Finally, let us emphasize that an “easy’’ solution by classical 
variational techniques in many cases is not the final objective. 
Most often when classical variational techniques are employed, 
the end result is an Euler equation, a linear or nonlinear differ- 
ential equation with two-point boundary conditions. The prob- 
lem of obtaining a computational solution of equations of this 
nature is not at all trivial and usually constitutes the major part 
of the difficulty of the original problem. Consequently, what 
is desired is an alternative approach which replaces the classical 
solution process by one involving operations which can easily 
and quickly be performed with a digital computer, and in some 
cases analytically. This is accomplished by the method of “‘re- 
duction of dimensionality techniques’’ in problems of the type 
discussed in this paper and in references [4] and [7]; see also 
R. Beckwith, ‘Analytic and Computational Aspects of Dynamic 
Programming Processes of High Dimension,’’ PhD thesis, Purdue, 
1959. 
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Design of Optimum Multivariable 


E. B. LEE? 

Senior Research Scientist, 
Minneapolis-Honeywell Regulator Company, 
Minneapolis, Minn. Assoc. Mem. ASME 


Control Systems 


The design of optimum controllers is considered for processes which are described by 
linear differential equations with one or more independent forcing terms. 


After presen- 


tation of a general method for designing what is usually a nonlinear controller, applica- 
tions are made to: (a) Design of minimum energy controllers, (b) design of minimum 


response time controllers |1],* and (c) design of minimum error controllers. 


Various 


other design criteria and restrictions on controller parameters are also discussed. 


Introduction 


Meer controller design is based at present on the 
point of view of analysis; i.e., the system is first constructed, at 
least in form, and then the complete system is analyzed to see if 
the performance is satisfactory, after which it is again adjusted 
and then reanalyzed. This procedure may be satisfactory for a 
number of cases in which there are only a few control parameters 
to adjust. Consider, however, the case in which we desire a 
better control (by some measure—which is discussed later) 
and there are several independent forcing functions. It may also 
be known that a nonlinear controller would outperform any linear 
controller. The question is then, what does one do to choose a 
better controller and, even after choosing one, how does one 
analyze it? This is, in most cases, a very difficult question to 
answer. 

This problem can be attacked directly; i.e., as in references 
[1, 2, 3] the performance can be first specified and then one can 
set out to find the controller (subject to certain constraints) 
which will do the specified job. The past few years have seen a 
large amount of effort devoted to the development of methods to 
do just this. Many mathematical criteria have been used to 
specify what constitutes a good controller. Some of these criteria 
are traceable by normal mathematical procedures, others can be 
handled only by the use of large-scale computers. In all design 
of optimum controllers, there appears to be no rhyme or reason 
connected to the methods used. Basically, however, these pro- 
cedures all stem from a common goal to seek an extreme for a 
prescribed functional. 

It is the purpose of this paper to give a self-contained presenta- 
tion of a general design procedure for handling such problems. 
This is done by first formulating the design problem as a quite 
general problem in the calculus of variations. To do this it is 
necessary to discuss a number of meaningful measures of what 
constitutes a good controller. The problem of finding the con- 
troller is then attacked, and it is shown how the parameters can 
be found using a number of different measures of goodness and 
physical constraints. We then consider the question of system 
stability and show that most measures of controller goodness 
constitute Lyapunov functions of certain forms, and thus the 
problem of stability need not concern us further. Lastly, these 
methods are illustrated by considering a number of optimum 


1 Also, Instructor, University of Minnesota, Minneapolis, Minn. 

2 Numbers in brackets designate References at end of paper. 
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control problems; e.g., the time optimal control problem is con- 
sidered along with other optimums and various constraints. Sev- 
eral basic theorems as to what form the controller should have are 
also proved. 


Description of Linear Dynamic Systems 


A brief description will be presented at this point to acquaint 
those who are not familiar with the concept of a linear process de- 
scribed by a system of linear differential equations, and the state 
of a process based on n-dependent parameters of the process. 

Let y be an m-dimensional vector called the process-control 
vector. Let z be an n-dimensional vector associated with the 
process output and its (n — 1)-derivatives (called state vector). 
The process is then completely described by the linear-vector dif- 
ferential equation relating the input y and output z:* 


dz 
A(t)z + Bit)y, = (1) 


where A(t) is ann X n matrix, B(t) isan n X m matrix, and z(0) 
= 2 is the initial condition vector at time zero. 
The solution to the homogeneous equation 


= 20) % (2) 
is given by 
x(t) = (3) 


where ® is a fundamental matrix solution; i.e., the basis of solu- 
tion for the homogeneous system (2). The solution to the com- 
plete system is then given by 


x(t) = + fi (4) 


For the linear plant which has the solution (4) the control prob- 
lem is to find the vector y(t) such as to have z(t) as close as 
possible to some ideal; e.g., in the regulator problem to have z(t) 
as close to some constant vector c as possible no matter what the 
disturbances. We must, however, consider certain constraints on 
the allowable value of y; e.g., it may be subject to saturation or 
the energy for control may be limited. 


General Formulation of the Optimum Control Problem 


The problem of control is, stated in general, to find the present 
control decision, subject to certain constraints which will minimize 


* The reduction of an nth order differential equation with deriva- 
tives of the forcing function to this system of equations can be found 
in Laning and Battin, ‘Random Processes in Automatic Control,” 
p. 191. 
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some measure of the deviation from the ideal. One of the most 
important aspects of the problem at this point is to find a mean- 
ingful measure of what constitutes a good controller. It is with 
this question that physical intuition plays an important role. 

It will be assumed that the process under control is adequately 
described by means of a linear differential equation of the form 
(1). The problem of control that we wish to consider is: What 
is the present value of the control vector y (the only time it is of 
real concern is the present time; i.e., it is only the present time at 
which it is necessary to know its value) which will do us the most 
good? To determine what the control vector (decision) should 
be, we must first define a measure of good control. To get sucha 
measure let us first define the following loss functions. 

Definition of Loss Function. Loss involved when the process as 
represented by the state vector z is not equal to the prescribed 
ideal state vector €. 

The following list indicates a number of the loss functions, L(z, 
y, t): 

(a) Simple loss function, 

L(z,§) =1,ife #& =Oifz =€ 

(b) Linear loss function, 

L(x, = — &)-A, 
>ox,? is the vector dot product. 
(c) Quadratic loss function, 


L(z, §) = (x — &)-NMa — &), 


where J is a suitable matrix. 


where z-z = 


(d) Time-dependent loss functions, 


L(x, &, t) = t*{(L(2, &)) 


(e) and so on. 


Based on the loss function we can now formulate the optimum 
control problem as a minimum-cost problem, where the cost 
of control is measured by the integrated loss function: The con- 
trol problem is then minimize 


T 
C= Kz, y, t)dt 


over the available control vector y(t). 

Usually it is necessary to impose certain physical constraints on 
the controller (control vector); e.g., the cost of the control, 
bounds on the available energy, saturation limits, and so on. To 
handle such constraints we use the usual method employed in the 
calculus of variations, which is to use a Lagrange multiplier. 
This procedure will be apparent when we later consider how the 
optimum forcing vector is determined. 

The problem of controller synthesis can then be viewed as: 
Given a dynamic process to find a decision rule which will map 
the error vector and/or estimate of the desired path optimally into 
an allowable control decision (present value of the control vec- 
tor). 


General Method for Finding the Optimum Controller 

One method to be used in finding the best controller (according 
to the measure being used) is based on a new principle in the cal- 
culus of variations due to Bellman [4]; namely, the “principle of 
optimality.” An optimal policy has the property that, whatever 
the initial state and the initial decision, the remaining decisions 
must constitute an optimal policy with regard to the state result- 
ing from the first decision. 

Consider now the problem of determining the optimum-de- 
cision rule or, what is the same thing, the optimum-present 
decision (the decision as a function of the deviation from the ideal). 
Note first that the minimum value of the cost function is a func- 
tion of only the initial state z, and the time 7. The minimum 
value of the cost function can then be written 


(5) 
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min 


(xo, T) = yeQ[C(y)] = yeQ L(z, dt (6) 


where we have used 2 to indicate the space of allowable control 
vectors. It has also been assumed that the ideal is a known 
function of time which can be written in terms of the time 7. 

We will proceed now to derive a functional equation for f(z, 
T). Let y = y(t) be a function yielding the minimum of C(y) 
and let us, therefore, write: 


min 


f(x(0), T) = yeQ L(z, dt 


min 3 7 
= yeQ [fi L(x, y)dt + L(z, (7) 


T>é>0 (8) 


Consider the second integral. The effect of any initial choice of 
y(t) for t in the interval [0, 5] will be, by way of the process dif- 
ferential equation (1), to convert z(0) into a new value at t = 6, 
which we will call 7(6). It follows then that, whatever the initial 
choice of y in the interval |0, 5], we will have over the remaining 
interval [6, 7'] a problem of precisely the same form as the origi- 
nal, with the difference that the initial condition is now r(6) and 
the remaining time interval is of length 7 — 6. This argument 
(an application of the principle of optimality) enables us to write: 
min 3 
f(x(0), T) = [fi U(x, y)dt + f(x(d), T — 5) | (9) 
If we assume the proper continuous dependence of the minimiz- 
ing function upon the initial conditions and time 7, we can re- 
write equation (9) using the mean value theorems as: 


min 


f(x(0), T) = ye kz y) + f(x(0), T) 


af of 


+ + By) + 


+ 08) | (10) 


Canceling f(x(0), 7’) on both sides and then taking the limit as 
5 + 0 the foregoing becomes‘ 
of 


min 

or y) + Vf-|Ar + att =0 (11) 
It is this result that plays a very important role in the choice of 
the optimum decision. The applications and use of this result 
will be apparent when we later consider a number of examples. 

Another means for finding the solution of the calculus of varia- 
tion problem, equation (5), is to use the classical calculus of 
variation method. This is to postulate that there does exist a 
function that requires the integral (5) to attain an extreme 
value. One must then look for a function close to the optimum 
which does not change the value of the integral very much 
and which in the limit is the desired optimal function. By sub- 
stituting for z in terms of y from equation (4) and noting that 
&(t) is only a function of time ¢, the integrand of equation (5) 
becomes L = L(y, 7, ...y, t). To solve for the minimum 
(or maximum) of equation (5) usually leads to the Euler- 
Lagrange equation® 


.=0, (12) 


where L is the integrand of equation (5) and 0 is the null vector. 


‘ A similar equation is obtained when the loss function is a func- 
tion of the time parameter ¢ (see reference [4], p. 263). 

5 For the type of process-equations to be considered this equation 
will be all that is required. 
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The Question of Stability 


As has been pointed out by Tsien [2], it is usually not necessary 
to consider the question of stability when the synthesis problem is 
attacked directly as is the intent of this paper. However, after 
the transient period it is usually necessary to use a second con- 
troller if control beyond the transient period is desired. 

As has been pointed out by Kalman and Bertram [5] it is quite 
easy to ensure stability if the cost function satisfies the hy- 
pothesis of the second method of Lyapunov for analyzing ques- 
tions of stability. The result which will ensure stability is as 
follows: 

Theorem (Kalman[5]). Consider a free, linear stationary dy- 
namic system with an equilibrium state at the origin and if (i) 
the loss function L(x) defined by equation (5) is of fixed positive 
sign and zero only when xz = 0, (ii) the cost function C as defined 
by equation (5) is finite in some neighborhood surrounding the 
origin, then z + 0 ast — ; ie., the origin is asymptotically 
stable. 

With this result and the previous observation, the question of 
stability will not be of further concern when it can be shown that 
the cost function satisfies the hypothesis of the theorem for 7’ = 
o,. Also, in the work that follows it will be assumed that the 
process equation system is nondegenerate (see Kalman [7] 4.2) 
and that there exists an allowable control that is capable of doing 
the required control job [8]. 


Minimum-Energy Controllers 


In this section we wish to consider the problem of control sub- 
ject to limitations on the total energy available for control; i.e., 
it is desired to do a prescribed job in such a way as to use the least 
amount of energy. This can be formulated as follows: Let us 
agree to measure the use of energy by means of quadratic loss 
function y-Hy, which will be positive definite if it satisfies the 
conditions of Sylvester’s theorem: 

Theorem (Sylvester [6]). In order that the quadratic form 
y-Hy with H symmetric be positive definite, it is necessary and 
sufficient that the principal minors of its discriminant, that is, 
the magnitudes of the determinants 

au, lay, ayn 


as 


|. 


be all positive. 

Using the positive definite quadratic loss function y-Hy, the 
cost function becomes: 


T 
f, y-Hy dt, T>0 


which we wish to minimize over y, subject to the constraint that 
in the given, or to be found, time interval T the controller does a 
prescribed job. For example, in an end-point control problem, it 
could be required that the process reach a certain state; e.g., 
require that 


= 0 = &(T)x + ig (14) 


We will solve this problem for only two situations, which will in- 
dicate how in general similar problems would be solved. 
Consider first the problem: 


min 


T 
ye (y-Hy)dt 


subject to the constraint that 
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4 
f, at’ (16) 
to handle the constraint equation we introduce the vector La- 
grange multiplier \ and our optimization problem becomes 
min 


T 
yeQ + )y(t’) at’ (17) 


To find the minimizing y in this case we use the Euler equation, 
(12), and the minimizing y for the given time to go 7 becomes 


Hy(t) + 
= [H + H*ly + = 0 (18) 
or 


(note * indicates transpose) which we now put into constraint 
equation to find A; i.e., 


1 r - —1)* , 


2J0 


—G(T)A 


If the matrix G(7') is nonsingular; i.e., det G(7') = 0 then it is 
possible to solve this equation to get 


A(T )xo (21) 


and the optimum control function over the given interval (0, 7') 
is 


1 
yt) = — 5 (22) 


There still exists a number of possibilities for this case; e.g., 
we could look for y as a function of the decreasing time to go, 
T — t, or as a function of a floating time interval 7’, or we could 
look for the best interval of time 7’. Looking for the best interval 
of time does not make much sense if the process was originally 
stable, for we know that in infinite time using no energy we can 
do the job. The floating-time-interval problem is also trivial. 
Consider then the problem of finding y as a function of the de- 
creasing time interval using at each point the present state of the 
process. All we have to do in this case is introduce a new variable 
for the time to go; i.e., let the limits of integration for our 
minimization be from the present time ¢ to the time 7 in equa- 
tion (17). 

This discussion of the minimum-energy contro] will not be dis- 
cussed further for, as pointed out by a reviewer, this problem 
has been developed in great detail in a recent paper by Bertram 
and Sarachik [9]. The point of the example, as shown here, is to 
indicate the procedure involved in finding controllers using 
energy as a cost function. 

Various other constraints and expressions for a measure of 
energy could be used; e.g., we could require that the process 
output z(¢) catch up with some ideal using the minimum amount 
of energy. If we express the ideal or desired path as a polynomial 
in time, the problem can be handled in exactly the same way as 
the example considered. It will also be apparent how a number 
of other examples could be formulated after we consider the 
minimum-response-time systems and the minimum-error systems, 


Minimum-Response-Time Controllers 


In this section we will discuss the design of controllers using the 
idea of response time as a measure of how well the controller is 
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performing. Generally, what we want to do is take the process 
from some initial condition at the present time to a prescribed 
condition in the minimum amount of time. Usually the controller 
will be subject to saturation (|y;| < b; which we can indicate by 
saying yeQ) or the total energy for control is limited, and so on. 

The first problem we will consider in this section is where the 
controller belongs to a closed bounded region Q.. This leads to the 
well-known time-optimal, controller-design problems, formulated 
as follows: 

The Time Optimal Control Problem. The control problem of con- 
cern here is to select an allowable control vector y(t) (allowable as 
defined in the foregoing) to carry the process, as represented by 
the state vector z(t) from an arbitrary initial state, 7(0) = 2», to 
a prescribed state &(¢) in minimum time. 

This problem can be formulated as a minimization problem as 


follows: 


tre(o, @) 


subject to the constraint 


x(t,) = &(t,) = + P(t, (24) 


If we consider only the regulator problem the constraint equation 
can be written, as before, 


We will now prove the following important theorem. 

Theorem. If for some yeQ, te(0, ©), x(2o, y, t) = &(t) for the 
process equation (1), then there exists a time optimal controller 
with y(t) on the boundary of {2 for all te(0, t,)* (¢, being the mini- 
mum response time); i.e., relay control is time optimal for all 
te(0, t,). (A similar theorem has also been presented by LaSalle 
{10}.) 

In connection with this theorem, we state also the following, 
which considers the problem of minimizing a loss function based 
only on the error (x — &), and where the forcing function y is re- 
stricted to a bounded closed region; i.e., ye: + 

Theorem. If there exists an optimum controller with ye for, 


min 


n eT 
ye Jo Kx — §)dt 


and the linear process (1), then there exists an optimum relay 
controller; i.e., |y,;| = for all te(0, 7’). 

Proof of the Time Optimal Control Theorem. Consider first the 
constraint equation (24) which we will rewrite in the form 


(26) 
To handle the constraint equation we use the standard procedure 


involving a Lagrange multiplier, \. Our minimization problem 
then becomes 


vel? (: at’ 


tre(0, @) 


4 
+ }) (27) 


Let t, be fixed and consider the problem 


min tr 4 
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which can be written in the form: 


min{ h-d 


which clearly has a minimum for ye® when 


y(t) = —b, sign (29) 
i=1,2,...m 


This result then establishes the theorem as stated,* for it makes 
no difference what response time we have used; in fact, the ¢, 
used could have been the minimum. The fact that the controller 
may not be unique is of no concern. The form of a controller is 
all that we were after. We note at this point that the theorem 
does not imply anything regarding the value of the forcing func- 
tion for time greater than the response time. 

It is a quite difficult problem to establish what the minimum 
response time ¢, is and the value of the Lagrange multiplier \. 
To find the minimum ¢, it is necessary to know that the minimum 
occurs with a finite number of switches in the control vector com- 
ponents y,; i.e., each element of y should switch from its maxi- 
mum to its minimum value at only a finite number of times in the 
time interval (0, ¢,). The following theorem is applicable in de- 
termining the number of switches in sign when the characteristic 
roots of the process are real and distinct. The proof is similar to 
that given by Bellman [4] for the regulator problem. 

Theorem. Consider the minimum-response-time controller 
problem as given by equation (23) with the constraint given by 
equation (24). If the matrix A has real distinct characteristic 
roots (A, B constant matrices) then each element of the time 
optimal control vector, y;, will change sign at most n — 1 times 
in the interval [0, ¢,] (n is the order of the process). 

Proof. From equation (30) for the optimum forcing function 
we note that if the process has real distinct roots, then each ele- 
ment of the matrix 


= O(-t) = 


will be made up of terms of form e*; thus equation (29) can be 
written in the form 


= sign [> 


The proof now follows, being due to the fact that the equation 


has at most n — 1 real zeros. This can be proved by induction.’ 

To solve this problem it is not enough to have the preceding 
theorems—trather one must give a method for finding y(t) as a 
function of the process state z. It will now be shown how it is 
possible to solve this problem in quite general situations. The 
second-order problem is easy even with two independent forcing 
functions [1]. The problem we will consider is the most general 
so far solved (see also Lee [{11]). The method is dependent on a 
good deal of computer being available for the controller or having 
available a good method for storing information. 

Based on the theorem concerning the number of sign switches 
in the distinct real-root case, it will be assumed that a finite, but 


* It is well known that if there exists one allowable control which 
does the prescribed job for the linear process-equation (1) then there 
exists an optimum control. 

7 See Polya and Szego, “‘Aufgaben und Lehrsi&tze aus der Analy- 
sis,” p. 49. 
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unknown, number of sign switches will include the time optimal 
controller. One can then proceed as follows: Let k; be the num- 
ber of switching times for the independent forcing-function ele- 
ment y,;, assuming that y(t,) is to go to zero at t = ¢,. The inte- 
gration of equation (24) is now performed to obtain: 


—z, = + y(t’ at’ 


or 
tr tr 
— Zo = gilt.) + f, Yull’ y(t’ dt’ + )ya(t’)dt’ +... 


+ ff, (30) 


with similar equations for —z, ... —2n9. Carrying out the in- 
tegration of equation (30), letting fi, tie, . . . tis, be the k, switching 
times for ta, ter, . . ++ Yo; Integrating we obtain 


—2,(0) = G(tu, ha, hey 


At this point the computer is called upon to find the minimum ¢, 
with an ordered set of switching times 


<4 


This then, in effect, completely solves the time optimal controller 
problem when a good estimate of the desired path £(t) is known. 
The actual use of this method is being reported by Smith [12]. 

Another minimum-response-time controller of great interest is 
one in which the energy for control is to be conserved and it is 
required to minimize response time. 

Minimum-Response-Time Controller With an Energy Constraint. 
Here we consider the problem 


tr 
Minimize f. p(t’) + (y-Hy)dt’ (31) 


where the quadratic form y-Hy is again used to measure thé 
energy and where the members of H are selected to get the desired 
weighting of response time and energy used and where p(t) is a 
polynomial in time selected in such a way as to weigh the impor- 
tance of fast response. The Sylvester theorem is used to insure 
that the quadratic form is positive definite. Again we are faced 
with the constraint equation 


x(t,) = &(t,) = D(t,)z(&) + y(t’ )dt’ (32) 


which can be written in the form 
— z(t) 
tr 
E(t’ 
t 


Introducing the Lagrange multiplier \ to handle the constraint 
equation, we can write our minimization equation as 


tre(to, @) 
d|—®-(t' E(t’) 
+ | dt’ (34) 


from which, using the Euler-Lagrange equation, the optimizing y 
for a fixed ¢, becomes 
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Kt) = (35) 


which we then put into the constraint equation (33) to obtain 
the Lagrange multiplier A; i.e., 


A = g(t, + hit, to) (36) 


assuming that the matrix 


is nonsingular. We have yet to minimize the response time ¢,. 
This can be accomplished by considering equation (31) with the 
known optimum y for a fixed time ¢,. After integrating equation 
(31) the minimization problem has the form 

min 

t,€(to, ©) [f(to, t,, 2(to))] (37) 
In most cases, for the type of problem considered, the function f 
will be continuous with continuous partial derivatives. The 
minimum ¢, will then occur, for fixed & and z(t), at one of the 
points where of/dt, = 0, or where t, = t, or where t, = ©. 

Again we note that there are many other possibilities as to the 
type of minimum-response-time controller. Most of these prob- 
lems can be handled by the method used to solve the two ex- 
amples presented. 

The last type of optimum controller to be considered will be 
the following type, where it is desired to minimize the difference 
between some ideal path and the actual path subject to certain 
constraints. 


Minimum-Error Systems 


In this section we will consider the problem of selecting the con- 
trol from an allowable set such as to minimize some average 
deviation from an ideal path. In this area we usually have what 
are called following systems. 

There are many different ways that the deviation from the ideal 
can be measured. In this section we will be able to consider only 
a couple of these. It is felt that the extension to the other cases 
will be apparent from the examples that are considered. There 
are many different ways that the controller can be restricted; e.g., 
it can be subject to saturation (a case which has commanded a 
great deal of our attention due to its common occurrence) or the 
resources for control will be limited, and so on. 

In the section which follows we will consider the case when we 
wish to minimize error, by some measure, and the energy for con- 
trol is to be conserved. 

Minimum-Error Controller With an Energy Constraint. In this sec- 
tion we will consider the problem of minimizing 


T 


where H is a suitably chosen matrix to weigh the importance of 
tight control and expenditure of control energy. The one-dimen- 
sional concept of this problem has been treated previously with 
remarkable results by Merriam [13]. We will proceed in some- 
what the same manner as did Merriam to establish results for the 
multidimensional problem. If y is allowed to take on all real 
values, we can write, by means of equation (11), the following 
partial differential equation for the optimum y; 


; H™B*vf (39) 


Let us consider the quadratic loss function 
L(x — §) = (z — &)-M(z — &) (40) 
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We now substitute the optimum y in equation (11) to obtain the 
partial differential equation 


of ates + Vf-Ar (41) 


where rT = T — &. To solve this equation assume that f(2(%), 7) 
can be written as a power series in its variables: i.e., 


(z(t), T) = Cor) + Ci(r)-2 + +... 


where C) is a scalar, C; an n-vector, C; an (nm X n) matrix with C2 
= (,*, and so forth. We need only consider the first three terms 
of this expansion as al] other terms will turn out to be zero. The 
partial derivatives of this equation as required by equation (41) 
are 


(42) 


Vi = Ci(r) + + C2*(r) Jz = + 2CAr)zx 


re) 
= = + + 


(43) 


Substituting this result into equation (41) we obtain: 


1 
C,’ + + = &-ME 9 BH-'B*Cc,-C, 


— [[M + + 2C,.BH-'B*C, — A*C,)-z 

+ [M — 2C,*BH-'B*C, + 2A*C, (44) 
Since the z(f); are arbitrary and linearily independent the only 
way this equation can hold is for us to require that 


1 
Cy’ = E |-c 
«= + + 2C.*BH-B*C, A*C;] 


and 
+ (C2) = + [ > [ (4) 
a set of 
2 
ordinary differential equations for the controller gains. The 


boundary condition for this set of equations is found from equa- 
tion (42), where when rt = 0, f(z(0), rT) = 0, hence each co- 
efficient must be zero; i.e., 
= 0 
C0) = 0 
+ = 0 


(46) 


To use this as a controller it is necessary either to store the solu- 
tions to this set of differential equations for various ideal paths 
E or have available computer equipment to solve the equations on 
line using the best present guess at what the ideal path & is. 
Merriam in his thesis [13] gives results for second-order systems 
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which indicate that a large improvement of system performance 
over usual designs can be achieved. We have extended his result 
in a quite general matter. 


Conclusions 


The method presented for synthesizing controllers for the linear 
process with more than one forcing function seems to be a logical 
one. In fact, its use in the design of controllers for the single- 
degree-of-freedom process will in many cases lead to a more 
satisfactory controller than that found by the classical tech- 
niques, this being true because the controller is not limited to 
being a linear one. 

The purpose of the examples has been to illustrate a general 
procedure for the time-domain synthesis of controllers including 
the case when there are several independent forcing functions to 
control. We have not attempted to exhaust the large number of 
possibilities to which this general formulation can be extended. 
Rather, the attempt has been made to indicate the wide range of 
application and the ease with which these problems can be solved. 

An important area which was not considered is where the 
input, or what has been called the ideal output, is not known for 
any length of time. Also, we have not considered the case where 
the process was subject to unknown external disturbances. It is 
hoped that later work will lead to results in this area and thus 
add to the importance of this method as a general design tool. 
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Kinetic Lyapunov Function for Stability 
Analysis of Nonlinear Control Systems 


A kinetic Lyapunov function is a Lyapunov function of the first derivatives of the state 


variables. Its use leads to a sufficient condition for the asymptotic stability in the large 
of a general nonlinear system without hysteresis. The foregoing sufficient condition is 
similar to but more stringent than the local stability condition for linearized systems. 


Introduction 


YAPUNOV’s second method! is one of the most gen- 
eral and powerful ways of analyzing the stability of nonlinear and 
time-dependent dynamical systems. While it has not received 
the attention it deserves in American literature during the past 
few decades, a recent series of papers by Kalman and Bertram?:* 
have done much toward making up for the lost time. One basic 
difficulty remains to be overcome: The lack of a systematic way 
of selecting a Lyapunov function, whenever it exists, greatly 
hampers the effectiveness of the method. While an ingenious 
choice of a Lyapunov function is sufficient to prove a system’s 
stability in the strongest sense, i.e., uniform asymptotic stability 
in the large, failure of such a choice does not prove the opposite. 
The Lyapunov function may exist, but one has not found it. At 
least it appears to the author that this basic difficulty is here to 
stay. Further development on Lyapunov’s second method is 
likely to be similar to that of the theory of differential equations 
itself: Special methods are worked out which solve special types 
of problems but there is no general solution for all. 

The kinetic Lyapunov functions described in this paper are es- 
sentially a special or subclass of Lyapunov functions, although 
the method is arrived at from a slightly different point of view: 
Instead of finding the condition for the state variables x to ap- 
proach some equilibrium value x,, a sufficient condition for the 
time derivatives x to approach zero is found. Obviously, as x 
approaches zero, the system arrives at one of its equilibrium 
points, and the two results are equivalent if the equilibrium point 
is unique. It has the following advantages: 


1 For different sets of steady-state inputs, the equilibrium 
values x, are different. In cases where the kinetic Lyapunov func- 
tion is independent of x,, the stability problem can be settled 
once and for all. 

2 The kinetic Lyapunov function leads naturally to lineariza- 
tion. 


One major disadvantage is that the condition is too sufficient. 
It is quite possible for a dynamical system to be uniformly asymp- 

1A. M. Lyapunov, “Probleme général de la stabilité du mouve- 
ment” (in French), Annales de la Faculté des Sciences de Toulouse, vol. 
9, 1907, pp. 203-474, reprinted in Annals of Mathematical Study, no. 
17, 1949, Princeton University Press, Princeton, N. J. 

?R. E. Kalman and J. E. Bertram, “Control System Analysis and 
Design Via the ‘Second Method’ of Lyapunov, I Continuous-Time 
Systems,”’ JouRNAL or Basic ENGINEERING, series D, Trans. ASME, 
vol. 82, 1960, pp. 371-393. 

3K. E. Kalman and J. E. Bertram, ‘‘Control System Analysis and 
Design Via the ‘Second Method’ of Lyapunov, II Discrete-Time 
Systems,” JouRNAL or Basic series D, Trans. ASME, 
vol. 82, 1960, pp. 394-400. 

Contributed by the Instruments and Regulators Division of Tue 
American Society of Mecuanicat ENGIngeERS and presented at 
the Joint Automatic Control Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
May 31, 1960. ASME Paper No. 60—JAC-7. 
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totically stable in the large without having a kinetic Lyapunov 
function. 

The idea of kinetic Lyapunov function is not essentially new. 
It has been anticipated by Krasovskii’s theorem‘ in which |x|? 
is used as the Lyapunov function. 


The Main Theorem 


Consider a continuous time dynamical system 


dx 
f(x, ¢) (1) 


where f is a continuous function of x and ¢t, having continuous 
finite first partial derivatives of x. A kinetic Lyapunov function 
K(x, x, t) is defined as follows: 

(i) K(x, %, t) is continuous in x, x, and t, positive definite in x, 
and is finite for finite |x| as is represented by the following 
inequality: 

0 < S K(x, %, t) < (2) 


where @ and are nondecreasing functions of a(0) = 
= 0, and a(|\x!!) and approach infinity as approches 
infinity. 

(ii) The total time derivative (d/dt)K(x, %, ¢) satisfies, 


d 
Kim < < 0 (3) 


except at x = 0 where y(0) = 0. 
Inequalities (2) and (3) are to be satisfied for all values of x. 


Theorem 


A dynamical system always converges to one of its equilibrium 
points xq if (i) there exists a kinetic Lyapunov function, and (ii) 
for any given r > O, there exists an m(r) > 0, such that IIe - x4! >r 
for all x4 implies that > m(r). 

The proof of the foregoing theorem is identical with the proof 
of Theorem 1 of footnote 2 except that x is used instead of x. 
As the system is UASL (uniformly asymptotically stable in the 
large) about x = 0, it converges toward one of the equilibrium 
points. The proof is given in Appendix 1 for completeness. 

The fact that the existence of a kinetic Lyapunov function is 
a rather stringent condition is clearly illustrated by the following 
proposition : 

Proposition 1. A dynamical system having one or more per- 
manent unstable equilibrium points cannot have a kinetic 
Lyapunov function. 


4N. N. Krasovskii, ‘On the Stability in the Large of a System of 
Nonlinear Differential Equations’’ (in Russian), Prikladnaya Mate- 
matika i Mekhanika, vol. 18, 1954, pp. 735-737; also “On Stability 
Under Large Perturbations” (in Russian), ibid, vol. 21, 1957, pp. 309- 
319. Also Theorem 4 of footnote 2. 
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Proof. Let x, be a permanent equilibrium point 


S(x,, = 0 for all ¢ (4) 
Differentiating equation (4) gives 
re) 
t)=0 (5) 


In the vicinity of x,, f(x, t) can be expanded into 


f(x, t) = A(x,, — x.) + — x,/*) (6) 
where A denotes the matrix with elements 
ay; (x, = — = (7) 


In a region R with ||x — x,|| sufficiently small, the linear part of 
equation (6) represents the system to any desired degree of ac- 
curacy. If the system is unstable about x,, some eigenvalues of 
A(x,, t) have negative real components. Since 


equations (5) and (6) give for all points within the region R 


= —A(x,, (8) 


It follows that kinetic Lyapunov function cannot exist within R 

Proposition 2. For time-invariant systems condition (ii) of 
the kinetic Lyapunov function can be relaxed by allowing (d/dt)- 
K(x, x) = 0 at a finite number of isolated points in x: x = &;. 
The term “‘isolated points’’ can be precisely defined as follows: 
For any small 6 > 0, there exists an € > 0 such that all the points 
x satisfying 


d 
—ey( < Kix #) <0 (9) 


are in the vicinities of the &,’s: 
— Ell <6 


The only changes in the proof of the theorem are using ey in- 
stead of y and adding to T the additional time required for the 
system to pass through the &; which are not equilibrium points: 


N-26 
= inf. {KEI 


where N is the number of &,’s which are not equilibrium points, 
and inf. {|| f(&,)||} is the lowest value of ||x|| at these &;’s. Since x 
is completely determined by x, and K(x, x) is decreasing, the 
system cannot pass the same £; more than once. Except in the 
intervals represented by Ai, condition (ii) of the theorem is 
satisfied for all values of ¢, and At can be made as small as desired 
by choosing 6 sufficiently small. There may be &;’s which are 
stable equilibrium points; however, once the system comes close 
enough to any of these points it converges toward that point due 
to Equation (6). 

Corollary 1. A dynamical system is uniformly asymptotically 
stable in the large if there exists a kinetic Lyapunov function and 
only one stable equilibrium point. 

The corollary can be proved by noting that since f has finite 
first partial derivatives, a nondecreasing function M can be found 
that ||x|| << M(||x||). Since A is positive definite near x,, equation 


(6) gives 
> Alle — 


(10) 


At (11) 
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in the vicinity of x,, where A is the lowest eigenvalue of A. Thus 
UASL about x = 0 is equivalent to UASL about x = x,. 
Corollary 2. A time-invariant dynamical system having only one 
stable equilibrium point is asymptotically stable in the large if there 
exists a positive definite symmetrical constant matriz B such that 


A'(x)B + BA(z) 


is also positive definite for every x, where A(x) is defined by equa- 
tion (7).° 
Proof. From equation (1) 


a=, 


- i 
j On; 


The foregoing equations can be written in matrix form 


% = —Ax (12) 


(x'Bx) = + + 
= —%'(A’B + BA — B)x (13) 


Therefore, if both B and A’B + BA — B are positive definite, 
the quadratic function *’Bx is a kinetic Lyapunov function. In 
case B is constant, B = 0, and corollary 2 is proved. 

While local stability about a point x requires only the existence 
of B such that A’B + BA is positive definite at x, UASL requires 
A’B + BA to be positive definite for the same constant B but all 
possible A as x varies. Using a matrix B which is dependent 
on x does not solve the problem since 


oB 
—2, 
j Or; 
is generally not negative definite. 
Corollary 2 gives one way of finding a kinetic Lyapunov func- 
tion; i.e., finding a positive definite matrix B with constant 
elements such that A’(x)B + BA(x) is positive definite for all x. 


Some Useful Algebraic Relations 


The following algebraic relations can be proved readily: 


1 For a symmetrical matrix A, the following statements are 
equivalent: 


(a) Ais positive definite. 

(b) There exists a nonsingular matrix B such that B’B = A. 
(c) All eigenvalues of A are positive. 

(d) The determinants of all principal minors of A are positive. 
(e) There exist positive definite matrices B and C such that 


ay; = for all 


2 For arbitrary A there exists a symmetrical and positive 
definite B such that A’B + BA is positive definite, if and only if 
all the eigenvalues of A are positive. The matrix B will be re- 
ferred to as an orientator of A. 

3 Let A(x) = Ao + f(x)A;, where Ay and A, are constant 
matrices, and L < f(x) < U. A symmetrical and positive 
definite matrix B is an orientator of A(x) for all x, if and only 
if B is a common orientator of both Ap + LA; and Ay + UA. 

4 There exists a common orientator B for matrices A and C if 
and only if there is a nonsingular transformation T such that both 
TAT~! and TCT~! are positive definite. Then B = T’T (see Ap- 
pendix 2 for proof). 


* Corollary 2 can also be considered a result of Krasovskii's theorem‘ 
by allowing a linear transform on x: K = |{Tx|]* where T’T = B. 
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IMustrative Examples 


Example 1. Steady-State Condition of a System With Inputs. Con- 
sider a modified version of Example 2 of Kalman and Bertram? 


= Le — + + i,(t) (14) 


(15) 


Assuming that the inputs i,(¢) and 7.(t) settle down to constant 
values 7, and i2, the problem is to determine if the system is 
UASL about its equilibrium point wherever it may be. 

By differentiating equations (14) and (15) with respect to t, the 
matrix A is found to be 


= —2, — are(x,? + x2) + i.(t) 


A — 1 
+1 
Since A’ + A is positive definite except at one point z; = 2: = 
0, 4? + 2” is the kinetic Lyapunov function. The system is 
UASL about its equilibrium point. 
x f(x) 
r(t) GIs) N Gis) 


Fig. 1 Block diagram of a system with nonlinear gain element 


Example 2. Systems With a Nonlinear Gain Element. A system 
with a nonlinear gain element is represented by the block diagram 
in Fig. 1. In terms of state variables x, its system equation can 
be written as 


x = —Aox — + [gor(t) + +...) (16) 


For inputs for which the bracketed terms are constants, a 
steady-state equilibrium point x, exists, and the system’s stability 
about x, is to be investigated. 

Differentiating equation (16) gives 


x = — (17) 


where A, is the matrix with 6 as its first column and zero as all 

other elements. Let U and L denote the upper and lower bounds 

of f(z): L < f(a) < U. The system is UASL if a common 

orientator can be found for Ay + LA, and Ay + UA. 
First-Order System. 


z= —az — f(z) + r(t) 


The matrices A, and A, have only one element each, a and 1, re- 
spectively. The sufficient condition for UASL is simply a + 
f(z) > 0. This is also the necessary condition if the system is to 
be stable for every z,. 
Second-Order System. 
scribed by the equations 


Consider a second-order system de- 


(18) 
= —ar, — f(x) 


where the derivative of f(z,) is bounded by L < f(z) < U. 
Appendix 3 shows that a common orientator can be found for the 


two matrices 
0 0 


(i) L>O, a>0, 
and 


(ii) U<(a+ WL) 
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Condition (i) is necessary for local stability, while conditions (i) 
and (ii) together are sufficient for UASL. 

However, condition (ii) is too strong as will be shown presently. 
Without loss of generality, one may assume f(0) = 0 and use as 
a trial Lyapunov function 


V(x) = F(a) + */ax, + 22)* (19) 
where 
F(x) = fx)dz 
From equation (19) one obtains 
V(x) = f(xi)ti + (az, + + 
= + (ax, + (20) 


= f(z) 


Therefore V(z) is a Lyapunov function if both az,f(z,) and F(z,) 
are positive definite. The latter is true if L > 0 and a > 0, and 
condition (ii) is not necessary. 


Conclusion 


Kinetic Lyapunov functions are essentially a subclass of 
Lyapunov functions. Their use leads naturally to linearization 
and a sufficient condition for uniform asymptotic stability in the 
large which condition is independent of the steady-state equi- 
librium point x, for a class of nonlinear control systems. The 
existence of a kinetic Lyapunov function is usually too strong a 
condition for uniform asymptotic stability in the large, but it is 
good whenever it can be found. 
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APPENDIX I 


Let x = (1; xo, to) represent a solution of equation (1) with the 
initial condition x = x9 att = t&. The point to be proved is that 
given any xo, 4 and an arbitrarily small r > 0, there exists a 
r) such that 

I(t; x0, to.) — <r (21) 
holds for at least one of the equilibrium points x, for all t > & 
+ 7). 

Let u be the largest constant satisfying 


< (22) 
and v be the smallest constant satisfying 

a[m(r)] < Blr) (23) 
Since K is a decreasing function of t 

< 1) < K(x, xo, to) < 
and it follows from the definition of u that 
< for allt > & (24) 

Similarly, if at any time 4, 

20, to)|| <» (25) 


then for all ¢ > 4 
K(0, 0, t) < B(v) = alm(r)) 
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and it follows from the definition of a 
m(r) for alll 


Let M denote the minimum value of y({'x||) in the closed interval 


t>t 


(26) 


v < <u 

M is a function of u and v which are in turn functions of |{:to|| 
andr. Let 7°(\\io||, r) be defined as 

r) = (27) 
if |\)| > v for all t in the interval to < t < t + 7, then 

lo + T) £ t)dt + K(x, Xo, to) 

< K(x, t.) — TM < TM = 0 


in contradiction to the definition of the kinetic Lyapunov function 
Therefore inequality (25) must be satisfied at some ¢, in the in- 
terval tp <t; < t+ T. Inequality (26) follows for all > t + T, 
and inequality (21) is implied from inequality (26) and condition 
(ii) of the theorem. 


APPENDIX 2 


To prove algebraic relation number 4, one notes that, if x’ Ax is 
a positive definite quadratic form in x, and x = Ty is a non- 
singular linear transformation, y’T’ ATy is also a positive definite 
quadratic form in y. Therefore A being positive definite implies 
TAT being positive definite and vice versa. Since 


+ BA)T— = (T-')’A’T’ + 
= (TAT-)’ + TAT" (28) 


A’B + BA being positive definite implies TAT~! being positive 
definite and vice versa. 


APPENDIX 3 


Condition (i) is necessary for the matrices to have all positive 
eigenvalues. To establish condition (ii) a general form of the 


common orientator is assumed: 
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Then 


1 c\fo cf’ ac —1 
The condition for BA to be positive definite is c > 0, ab — c > 0, 
and 


4cf'(ab — c) — (ac — 1 + df’)? > 0 (29) 
Inequality (29) can be written as 
(f' — — Ax) < 0 (30) 


with 


Substituting equation (31) into (32) gives 


c 
Maximizing the right-hand side of equation (33) gives 
c a 
> 2 (34) 
Substituting equation (34) into (33) gives 
- VM =a (35) 


Inequality (30) is satisfied for all f’ only by a choice of A; and Az 
such that A, < LD < U < dy. This can be done if (ii) is satisfied by 
choosing 


1 a? 
VLU + — 


and c = ab/2. 
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New Results in Linear Filtering and 
Prediction Theory’ 
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A nonlinear differential equation of the Riccati type is derived for the covariance 
matrix of the optimal filtering error. 


pletely specifies the optimal filter for either finite or infinite smoothing intervals and 


The solution of this ‘‘variance equation" com- 


stationary or nonstationary statistics. 


R. $. BUCY 
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The variance equation is closely related to the Hamiltonian (canonical) differential 
equations of the calculus of variations. Analytic solutions are available in some cases. 
The significance of the variance equation is illustrated by examples which duplicate, 


simplify, or extend earlier results in this field. 

The Duality Principle relating stochastic estimation and deterministic control 
problems plays an important role in the proof of theoretical results. In several examples, 
the estimation problem and its dual are discussed side-by-side. 

Properties of the variance equation are of great interest in the theory of adaptive 


systems. 


1 Introduction 


A. PRESENT, 4 nonspecialist might well regard the 
Wiener-Kolmogorov theory of filtering and prediction [1, 2]* as 
“classical’ —in short, a field where the techniques are well 
established and only minor improvements and generalizations 
can be expected. 

That this is not really so can be seen convincingly from recent 
results of Shinbrot [3], Steeg [4], Pugachev [5, 6], and Parzen [7]. 
Using a variety of time-domain methods, these investigators have 
solved some long-standing problems in nonstationary filtering and 
prediction theory. We present here a unified account of our own 
independent researches during the past two years (which overlap 
with much of the work [3-7] just mentioned), as well as numerous 
new results. We, too, use time-domain methods, and obtain 
major improvements and generalizations of the conventional 
Wiener theory. In particular, our methods apply without 
modification to multivariate problems. 

The following is the historical background of this paper. 

In an extension of the standard Wiener filtering problem, Follin 
[8] obtained relationships between time-varying gains and error 
variances for a given circuit configuration. Later, Hanson [9] 
proved that Follin’s circuit configuration was actually optimal 
for the assumed statistics; moreover, he showed that the differen- 
tial equations for the error variance (first obtained by Follin) 
follow rigorously from the Wiener-Hopf equation. These results 
were then generalized by Bucy [10], who found explicit rela- 
tionships between the optimal weighting functions and the error 
variances; he also gave a rigorous derivation of the variance 
equations and those of the optimal filter for a wide class of non- 
stationary signal and noise statistics. 

Independently of the work just mentioned, Kalman [11] gave 
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the Bureau of Naval Weapons under Contract NOrd-73861. 
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Some aspects of thi: are considered briefly. 


a new approach to the standard filtering and prediction problem. 
The novelty consisted in combining two well-known ideas: 


(i) the “‘state-transition’’ method of describing dynamical sys- 
tems [12-14], and 

(ii) linear filtering regarded as orthogonal projection in Hilbert 
space [15, pp. 150-155}. 


As an important by-product, this approach yielded the Duality 
Principle {11, 16] which provides a link between (stochastic) 
filtering theory and (deterministic) control theory. Because of 
the duality, results on the optimal design of linear control systems 
[13, 16, 17] are directly applicable to the Wiener problem. Dual- 
ity plays an important role in this paper also. 

When the authors became aware of each other’s work, it was 
soon realized that the principal conclusion of both investigations 
was identical, in spite of the difference in methods: 

Rather than to attack the Wiener-Hopf integral equation directly, 
it is better to convert it into a nonlinear differential equation, whose 
solution yields the covariance matrix of the minimum filtering error, 
which in turn contains all necessary information for the design of the 
optimal filter. 


2 Summary of Results: Description 


The problem considered in this paper is stated precisely in 
Section 4. There are two main assumptions: 

(A;) A sufficiently accurate model of the message process is 
given by a linear (possibly time-varying) dynamical system 
excited by white noise. 

(A) Every observed signal contains an additive white noise 
component. 

Assumption (A:) is unnecessary when the random processes in 
question are sampled (discrete-time parameter); see [11]. Even 
in the continuous-time case, (A2) is no real restriction since it can 
be removed in various ways as will be shown in a future paper. 
Assumption (A,), however, is quite basic; it is analogous to but 
somewhat less restrictive than the assumption of rational spectra 
in the conventional theory. 

Within these assumptions, we seek the best linear estimate of 
the message based on past data lying in either a finite or infinite 
time-interval. 

The fundamental relations of our new approach consist of five 
equations: 
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(I) The differential equation governing the optimal filter, 
which is excited by the observed signals and generates the best 
linear estimate of the message. 

(IL) The differential equations governing the error of the best 
linear estimate. 

(111) The time-varying gains of the optimal filter expressed in 
terms of the error variances. 

(IV) The nonlinear differential equation governing the co- 
variance matrix of the errors of the best linear estimate, called the 
variance equation. 

(V) The formula for prediction. 

The solution of the variance equation for a given finite time- 
interval is equivalent to the solution of the estimation or pre- 
diction problem with respect to the same time-interval. The 
steady-state solution of the variance equation corresponds to 
finding the best estimate based on all the data in the past. 

As a special case, one gets the solution of the classical (station- 
ary) Wiener problem by finding the unique equilibrium point of 
the variance equation. This requires solving a set of algebraic 
equations and constitutes a new method of designing Wiener 
filters. The superior effectiveness of this procedure over present 
methods is shown in the examples. 

Some of the preceding ideas are implicit already in [10, 11]; 
they appear here in a fully developed form. Other more ad- 
vanced problems have been investigated only very recently and 
provide incentives for much further research. We discuss the 
following further results: 


(1) The variance equations are of the Riccati type which occur 
in the calculus of variations and are closely related to the canonical 
differential equations of Hamilton. This relationship gives rise 
to a well-known analytic formula for the solution of the Riccati 
equation [17, 18]. The Hamiltonian equations have also been 
used recently [19] in the study of optimal control systems. The 
two types of problems are actually duals of one another as men- 
tioned in the Introduction. The duality is illustrated by several 
examples. 

(2) A sufficient condition for the existence of steady-state solu- 
tions of the variance equation (i.e., the fact that the error variance 
does not increase indefinitely) is that the information matrix in 
the sense of R. A. Fisher [20] be nonsingular. This condition is 
considerably weaker than the usual assumption that the message 
process have finite variance. 

(3) A sufficient condition for the optimal filter to be stable is 
the dual of the preceding condition. 


The preceding results are established with the aid of the ‘‘state- 
transition” method of analysis of dynamical systems. This con- 
sists essentially of the systematic use of vector-matrix notation 
which results in simple and clear statements of the main results 
independently of the complexity of specific problems. This is 
the reason why multivariable filtering problems can be treated by 
our methods without any additional theoretical complications. 

The outline of contents is as follows: 

In Section 3 we review the description of dynamical systems 
from the state point of view. Sections 4-5 contain precise state- 
ments of the filtering problem and of the dual control problem. 
The examples in Section 6 illustrate the filtering problem and its 
dual in conventional block-diagram terminology. Section 7 con- 
tains a precise statement of all mathematical results. A reader 
interested mainly in applications may pass from Section 7 directly 
to the worked-out examples in Section 11. The rigorous deriva- 
tion of the fundamental equations is given in Section 8. Section 9 
outlines proofs, based on the Duality Principle, of the existence 
and stability of solutions of the variance equation. The theory 
of analytic solutions of the variance equation is discussed in 
Section 10. In Section 12 we examine briefly the relation of our 
results to adaptive filtering problems. A critical evaluation of 
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the current status of the statistical filtering problem is presented 
in Section 13. 


3 Preliminaries 


In the main, we shall follow the notation conventions (though 
not the specific nomenclature) of [11], [16], and [21]. Thusr, t, t& 
refer to the time, a, 8, . . ., 21, Z2, -» Pry Gy, - . are 
(real) scalars; a, b,..., x, y,...,, t,...are vectors, A, B, 
...4 ®, W,... are matrices. The prime denotes the transposed 
matrix; thus x’y is the scalar (inner) product and xy’ denotes 
a matrix with elements z,y; (outer product). ||x|| = (x’x)'/* is 
the euclidean norm and ||x||*4 (where A is a nonnegative definite 
matrix) is the quadratic form with respect to A. The eigenvalues 
of a matrix A are written as A,(A). The expected value (en- 
semble average) is denoted by& (usually not followed by brackets). 
The covariance matrix of two vector-valued random variables 
x(t), y(7) is denoted by 


Ex(t)y'(r) — or cov[x(t), y(7)] 


ays, 


depending on what form is more convenient. 
Real-valued linear functions of a vector x will be denoted by 
x*; the value of x* at x is denoted by 


{x*, x] = 


t=1 


where the z; are the co-ordinates of x. As is well known, x* may 
be regarded abstractly as an element of the dual vector space of the 
x’s; for this reason, x* is called a covector and its co-ordinates are 
the z*;. In algebraic manipulations we regard x* formally as a 
row vector (remembering, of course, that x* ~ x’). Thus the 
inner product is x*y*’ and we define ||x*|| by (x*x*’)'/*. Also 


E[x*, x]? = E(x*x)® = Ex*xx’x*’ 
= = 


To establish the terminology, we now review the essentials of 
the so-called state-iransilion method of analysis of dynamical 
systems. For more details see, for instance, [21]. 

A linear dynamical system governed by an ordinary differential 
equation can always be described in such a way that the defin- 
ing equations are in the standard form: 


dx/di = F(t)x + G(t)u(t) (1) 


where x is an n-vector, called the stale; the co-ordinates x; of x 
are called state variables; y(t) is an m-vector, called the control 
function; F(t) and G(t) are n X n and n X m matrices, respectively, 
whose elements are continuous functions of the time t. 

The description (1) is incomplete without specifying the out- 
put y(t) of the system; this may be taken as a p-vector whose 
components are linear combinations of the state variables: 


y(t) = H(t)x(t) 


where H(t) isa p X n matrix continuous in ¢. 

The matrices F, G, H can be usually determined by inspection 
if the system equations are given in block diagram form. See 
the examples in Section 5. It should be remembered that any of 
these matrices may be nonsingular. F represents the dynamics, G 
the constraints on affecting the state of the system by inputs, and 
H the constraints on observing the state of the system from out- 
puts. For single-input/single-output systems, G and H consist 
of a single column and single row, respectively. 

If F, G, H are constants, (3) is a constant system. 
or, equivalently, G = 0, (3) is said to be free. 

It is well known [21-23] that the general solution of (1) may 
be written in the form 


(2) 


If u(t) = 0 
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x(t) = @(t, + (3) 


where we call @(t, t)) the transition matrix of (1). The transition 
matrix is a nonsingular matrix satisfying the differential equation 


d®/dt = F(t)® (4) 


(any such matrix is a fundamental matrix (23, Chapter 3] ), made 
unique by the additional requirement that, for all to, 


®t, &) = | = unit matrix (5) 


The following properties are immediate by the existence and 
uniqueness of solutions of (1): 


®-“(t,&) = 4) forall t,t (6) 
P(t, P(t, t)@(t, for all te, hi, te (7) 


If F = const, then the transition matrix can be represented by 
the well-known formula 


which is quite convenient for numerical computations. In this 
special case, one can also express ® analytically in terms of the 
eigenvalues of F, using either linear algebra [22] or standard 
transfer-function techniques [14]. 

In some cases, it is convenient to replace the right-hand side of 
(3) by a notation that focuses attention on how the state of the 
system “moves’’ in the state space as a function of time. Thus 
we write the left-hand side of (3) as 


x(t) = O(t; x, te; u) (9) 


Read: The state of the system (1) at time ¢, evolving from the 
initial state x = x(t) at time & under the action of a fized forcing 
function u(t). For simplicity, we refer to @ as the motion of the 
dynamical system 


4 Statement of Problem 


We shall be concerned with the continuous-time analog of 
Problem I of reference [11], which should be consulted for the 
physical motivation of the assumptions stated below. 

(A,) The message is a random process x(t) generated by the 
model 


dx/dt = F(t)x + G(t)u(t) 
The observed signal is 


(10) 


z(t) = y(t) + v(t) = H(t)x(t) + v(t) (11) 


The functions u(z), v(t) in (10-11) are independent random proc- 
esses (white noise) with identically zero means and covariance 
matrices 


cov [u(t), u(r)] = Q(t)-d(t — r) 
cov [v(t), vir)] = R(t)-d(t — 7) 
cov [u(t), v(r)] = 0 


forall t,7 (12) 


where 6 is the Dirac delta function, and Q(t), R(t) are symmetric, 
nonnegative definite matrices continuously differentiable in ¢. 

We introduce already here a restrictive assumption, which is 
needed for the ensuing theoretical developments: 

(A:) The matrix R(t) is positive definite for all ¢. Physically, 
this means that no component of the signal can be measured 
exactly. 

To determine the random process x(t) uniquely, it is necessary 
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to add a further assumption. This may be done in two different 
ways: 

(A;) The dynamical system (10) has reached “‘steady-state”’ 
under the action of u(t), in other words, x(t) is the random func- 
tion defined by 

t 
x(t) = P(t, (13) 
This formula is valid if the system (10) is uniformly asymp- 
totically stable (for precise definition, valid also in the noncon- 
stant case, see [21]). If, in addition, it is true that F, G, Q are 
constant, then x(t) is a stationary random process—this is one of 
the chief assumptions of the original Wiener theory. 

However, the requirement of asymptotic stability is incon- 
venient in some cases. For instance, it is not satisfied in Example 
5, which is a useful model in some missile guidance problems 
Moreover, the representation of random functions as generated 
by a linear dynamical system is already an appreciable restriction 
and one should try to avoid making any further assumptions. 
Hence we prefer to use: 

(A,’) The measurement of 2(¢) starts at some fixed instant & 
of time (which may be — ~), at which time cov|x(t), x(t)] is 
known. 

Assumption (A;) is obviously a special case of (A,’). Moreover, 
since (10) is not necessarily stable, this way of proceeding makes 
it possible to treat also situations where the message variance 
grows indefinitely, which is excluded in the conventional theory. 

The main object of the paper is to study the 

OPTIMAL ESTIMATION PROBLEM. Given known values 
of in the time-interval < t, find an estimate %(t,|t) of 
x(t) of the form 

t 
x(t,\t) = A(t, (14) 
(where A is an n X p matrix whose elements are continuously 
differentiable in both arguments) with the property that the expected 
squared error in estimating any linear function of the message is 
minimized: 
&[x*, x(t) — x(t; |t)}? = minimum for all x* 


Remarks. (a) Obviously this problem includes as a special 
case the more common one in which it is desired to minimize 


— 


(b) In view of (A,), it is clear that &x(t) = &x(t,|\t) = 0. 
Hence [x*, x(t:/t)] is the minimum variance linear unbiased 
estimate of the value of any costate x* at x(t;). 

(c) If u(t) is unknown, we have a more difficult problem which 
will be considered in a future paper. 

(d) It may be recalled (see, e.g., [11]) that if uw and v are 
gaussian, then so are also x and z, and therefore the best estimate 
will be of the type (14). Moreover, the same estimate will be best 
not only for the loss function (15) but also for a wide variety of 
other loss functions. 

(e) The representation of white noise in the form (12) is not 
rigorous, because of the use of delta “functions.”” But since the 
delta function occurs only in integrals, the difficulty is easily re- 
moved as we shall show in a future paper addressed to mathema- 
ticians. All other mathematical developments given in the paper 
are rigorous. 

The solution of the estimation problem under assumptions 
(Aj), (As), (Ay’) is stated in Section 7 and proved in Section 8. 


5 The Dual Problem 


It will be useful to consider now the dual of the optimal estima- 
tion problem which turns out to be the optimal regulator problem 
in the theory of control. 


(15) 
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First we define a dynamical system which is the dual (or ad- 
joint) of (1). Let 
(* = —t 
F*(t*) = F(t) 
G*(t*) = H(t) 
H*((*) = G(t) 
Let ®*(i*, &*) be the transition matrix of the dual dynamical 
system of (1): 
dx*/dt* _ F*(t*)x* + G*(t*)u*(t*) (17) 
It is easy to verify the fundamental relation 
&*) = t) 


(16) 


(18) 


With these notation conventions, we can now state the 
OPTIMAL REGULATOR PROBLEM. Consider the linear 
dynamical system (17). Find a “control law”’ 


u*(i*) = k*(x*(t*), &*) (19) 


with the property that, for this choice of u*(t*), the “performance 
index” 


assumes ils greatest lower bound. 

This is a natural generalization of the well-known problem of 
the optimization of a regulator with integrated-squared-error 
type of performance index. 

The mathematical theory of the optimal regulator problem has 
been explored in considerable detail [17]. These results can be 
applied directly to the optimal estimation problem because of the 

DUALITY THEOREM. The solutions of the optimal estima- 
tion problem and of the optimal regulator problem are equivalent 
under the duality relations (16). 

The nature of these solutions will be discussed in the sequel. 
Here we pause only to observe a trivial point: By (14), the solu- 
tions of the estimation problem are necessarily linear; hence the 
same must be true (if the duality theorem is correct) of the solu- 
tions of the optimal regulator problem; in other words, the op- 
timal control law k* must be a linear function of x*. 

The first proof of the duality theorem appeared in [11], and 
consisted of comparing the end results of the solutions of the two 
problems. Assuming only that the solutions of both problems 
result in linear dynamical systems, the proof becomes much 
simpler and less mysterious; this argument was carried out in 
detail in [16]. 

Remark (f). If we generalize the optimal regulator problem to 
the extent of replacing the first integrand in (20) by 


lly*(r*) — Paces) 


where y,*(t*) # 0 is the desired output (in other words, if the 
regulator problem is replaced by a servomechanism or follow-up 
problem), then we have the dual of the estimation problem with 
Eu(t) 0. 


6 Examples: Problem Statement 


To illustrate the matrix formalism and the general problems 
stated in Sections 4-5, we present here some specific problems in 
the standard block-diagram terminology. The solution of these 
problems is given in Section 11. 

Example 1. Let the model of the message process be a first- 
order, linear, constant dynamical system. It is not assumed 
that the model is stable; but if so, this is the simplest problem in 
the Wiener theory which was discussed first by Wiener himself 
[1, pp. 91-92}. 
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Model of 
Message Process 


(A) 


(B) 


Fig.1 Example 1: Block diagram of message process and optimal filter 


The model of the message process is shown in Fig. l(a). The 
various matrices involved are all defined by 1 X 1 and are 


F(t) = (ful, Gt) = [1], AH = [1], 
Q(t) = [qu], = [rul 


The model is identical with its dual. Then the dual problem 
concerns the plant 


dz*,/dt* = + u*(t*), y*i(t) = 


and the performance index is 


to* 
The discrete-time version of the estimation problem was treated 
in {11, Example 1]. The dual problem was treated by Rozonoér 
{19}. 

Example 2. The message is generated as in Example 1, but 
now it is assumed that two separate signals (mixed with dif- 
ferent noise) can be observed. Hence R is now a 2 X 2 matrix 


and we assume that 
1 
1 ] 


The block diagram of the model is shown in Fig. 2(a). 


(A) | (B) 


Fig. 2 Example 2: Block diagram of message process and optimal filter 


Ezample 3. The message is generated by putting white noise 
through the transfer function 1/s(s + 1). The block diagram of 
the model is shown in Fig. 3(a). The system matrices are: 


0 1 


In the dual model, the order of the blocks 1/s and 1/(s + 1) 
is interchanged. See Fig. 4. The performance index remains the 
same as (21), The dual problem was investigated by Kipiniak 
{24}. 
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| | 
: 
5 |_| 
| - | 
! (5) | 
| 


Fig.3 Example 3: Block diagram of message process and optimal filter 
(x; and %; should be interchanged with x: and xX».) 


Fig. 5 Example 4: Block diagram of message process and optimal filter 


Example 4. The message is generated by putting white noise The differences between the two examples lie in the nature of the 
through the transfer function s/(s*? — fief). The block diagram ‘starting’ assumptions and in the observed signals. 
of the model is shown in Fig. 5(a). The system matrices are: Example 5. Following Shinbrot [3], we consider the following 

0 1 situation. A particle leaves the origin at time & = 0 with a fixed 

F = [’ = ] H = [1 0) but unknown velocity of zero mean and known variance. The 

position of the particle is continually observed in the presence of 

additive white noise. We are to find the best estimator of posi- 
tion and velocity. 

The verbal description of the problem implies that pu(0) = 
pi(0) = 0, p2 (0) > O and gu = 0. Moreover, G = 0, H = 
{10}. See Fig. 7(a). 

The dual of this problem is somewhat unusual; it calls for 

0 minimizing the performance index 


21 


The transfer function of the dual model is also s/(s? — fi2fa). 
However, in drawing the block diagram, the locations of the first 
and second state variables are interchanged, see Fig. 6. Evi- 
dently f*i2 = fx and f*n = fiz. The performance index is again 
given by (21). 

The message model for the next two examples is the same and 
is defined by: 


0 
x*, u*)]? +f (t* < 0) 


In words: We are given a transfer function 1/s*; the input u*; 

| over the time-interval [¢*, 0] should be selected in such a way as 

to minimize the sum of (i) the square of the velocity and (ii) the 

control energy. In the discrete-time case, this problem was 
treated in [11, Example 2]. 

Ezample 6. We assume here that the transfer function 1/s? is 

excited by white noise and that both the position x; and velocity 

22 can be observed in the presence of noise. Therefore (see Fig. 


(a) 
Fig.7 Example 5: Block diagram of message process and optimal filter 
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Fig. 8 Example 6: Block diagram of message process and optimal filter 


x(ti/t) 


Fig.9 General block diagram of optimal filter 


hi O 

0 he 
This problem was studied by Hanson [9] and Bucy [25. 26]. 
The dual problem is very similar to Examples 3 and 4. 


7 Summary of Results: Mathematics 


Here we present the main results of the paper in precise mathe- 
matical terms. At the present stage of our understanding of the 
problem, the rigorous proof of these facts is quite complicated, 
requiring advanced and unconventional methods; they are to be 
found in Sections 8-10. After reading this section, one may pass 
without loss of continuity to Section 11 which contains the solu- 
tions of the examples. 

(1) Canonical form of the optimal filter. The optimal estimate 
x(t\t) is generated by a linear dynamical system of the form 


dx(t\t)/dt = F(tyx(t\t) + K(t)a(tlt) 
#(t\t) = 2(t) — 
The initial state %(t|te) of (1) is zero. 
For optimal extrapolation, we add the relation 


No similarly simple formula is known at present for interpolation 
< t). 

The block diagram of (I) and (V) is shown in Fig. 9. The 
variables appearing in this diagram are vectors and the ‘‘boxes’’ 
represent matrices operating on vectors. Otherwise (except for 
the noncommutativity of matrix multiplication) such generalized 
block diagrams are subject to the same rules as ordinary block 


(1) 
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diagrams. The fat lines indicating direction of signal flow serve 
as a reminder that we are dealing with multiple rather than 
single signals. 

The optimal filter (1) is a feedback system. It is obtained by 
taking a copy of the model of the message process (omitting the 
constraint at the input), forming the error signal z(t/t) and feed- 
ing the error forward with a gain K(t). Thus the specification of 
the optimal filter is equivalent to the computation of the optimal 
time-varying gains K(t). This result is general and does not de- 
pend on constancy of the model. 

(2) Canonical form for the dynamical system governing the 
optimal error. Let 

(t\t) = x(t) — (22) 


Except for the way in which the excitations enter the optimal 
error, x(t\t) is governed by the same dynamical system as %(t\t): 
dx(tt)/dt = + G(t)u(t) — K(e)[v(t) 


+ (11) 


See Fig. 10. 
(3) Optimal gain. Let us introduce the abbreviation: 
P(t) = cov[ x(t), 
Then it can be shown that 
K(t) = (111) 


(4) Variance equation. The only remaining unknown is P(t). 
It can be shown that P(t) must be a solution of the matrix dif- 
ferential equation 
dP/dt = F(t)P + PF'(t) — 

+ 
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(23) 


' 
v a { 
| 
| 
| (3) 
MODEL OF MESSAGE PROCESS 
| 


This is the variance equation; it is a system of n(n + 1)/2* non- 
linear differential equations of the first order, and is of the Riccati 
type well known in the calculus of variations [17, 18]. 

(5) Existence of solutions of the variance equation. Given any 
fixed initial time & and a nonnegative definite matrix Po, (IV) has 
a unique solution 


P(t) = Po, to) (24) 


defined for all |t — t| sufficiently small, which takes on the value 
P(t) = Poatt = t% This follows at once from the fact that (IV) 
satisfies a Lipschitz condition [21)}. 

Since (1V) is nonlinear, we cannot of course conclude without 
further investigation that a solution P(t) exists for all t [21]. By 
taking into account the problem from which (IV) was derived, 
however, it can be shown that P(¢) in (24) is defined for all ¢ = to. 

These results can be summarized by the following theorem, 
which is the analogue of Theorem 3 of [11] and is proved in 
Section 8: 

THEOREM 1. Under Assumptions (A;), (Az), (As’), the 
solution of the optimal estimation problem with tp > — © is given by 
relations (I-V). The solution P(t) of (IV) is uniquely determined 
for allt = t by the specification of 


Po = cov[x(t%), x(t)] ; 


knowledge of P(t) in turn determines the optimal gain K(t). The 
initial state of the optimal filter is 0. 

(6) Variance of the estimate of a costate. From (23) we have 
immediately the following formula for (15): 


El[x*, x(t\t)]* = (25) 


(7) Analytic solution of the variance equation. Because of the 
close relationship between the Riccati equation and the calculus 
of variations, a closed-form solution of sorts is available for (IV). 
The easiest way of obtaining it is as follows [17]: 

Introduce the quadratic Hamilionian function 


5C(x, w, t) = 

— + (26) 
and consider the associated canonical differential equations 
dx/dt = = —F’(t)x + H’(t)R-(t)H(t)w 
dw/dt = = + 
We denote the transition matrix of (27) by 


(27) 


=| 


(28) 


. ‘This is the number of distinct elements of the eymmetric matrix 

* The notation 03C/dw means the gradient of the scalar KX with 
respect to the vector w. 


In Section 10 we shall prove 
THEOREM 2. The solution of (IV) for arbitrary nonnegative 
definite, symmetric Py and all t = te can be represented by the formula 


+ (29) 


Unless all matrices occurring in (27) are constant, this result 
simply replaces one difficult problem by another of similar dif- 
ficulty, since only in the rarest cases can @(t, &) be expressed in 
analytic form. Something has been accomplished, however, since 
we have shown that the solution of nonconstant estimation problems 
involves precisely the same analytic difficulties as the solution of linear 
differential equations with variable coefficients. 

(8) Existence of steady-state solution. If the time-interval over 
which data are available is infinite, in other words, if t = — ©, 
Theorem 1 is not applicable without some further restriction. 

For instance, if H(t) = 0, the variance of x is the same as the 
variance of x; if the model (10-11) is unstable, then x(t) defined 
by (13) does not exist and the estimation problem is meaningless. 

The following theorem, proved in Section 9, gives two sufficient 
conditions for the steady-state estimation problem to be meaning- 
ful. The first is the one assumed at the very beginning in the 
conventional Wiener theory. The second condition, which we in- 
troduce here for the first time, is much weaker and more “natural’’ 
than the first; moreover, it is almost a necessary condition as well. 

THEOREM 3. Denote the solutions of (IV) as in (24). Then 
the limit 

exists for all t and is a solution of (1V) if either 

(Ay) the model (10-11) is uniformly asymptotically stable; or 

(Aq’) the model (10-11) is “completely observable’’ [17], that is, 
for all t there is some tot) < t such that the matriz 


M(to, = f OH dr (31) 


is positive definite. (See {21} for the definition of uniform asymptotic 
stability.) 

Remarks. (g) P(t) is the covariance matrix of the optimal error 
corresponding to the very special situation in which (i) an arbi- 
trarily long record of past measurements is available, and (ii) the 
initial state x(t) was known exactly. When all matrices in 
(10-12) are constant, then so is also P—this is just the classical 
Wiener problem. In the constant case, P is an equilibrium 
state of (IV) (i.e., for this choice of P, the right-hand side of (IV) 
is zero). In general, P(t) should be regarded as a moving equi- 
librium point of (1V), see Theorem 4 below. 

(h) The matrix M(t, ¢) is well known in mathematical statistics. 
It is the information matriz in the sense of R. A. Fisher [20] 
corresponding to the special estimation problem when (i) u(f)=0 
and (ii) v(t) = gaussian with unit covariance matrix. In this 
case, the variance of any unbiased estimator p(t) of [x,* x(t)] 
satisfies the well-known Cramér-Rao inequality [20] 


(30) 


Fig. 10 General block diagram of optimal estimation error 
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— Su(t)}? = (32) 


Every costate x* has a minimum-variance unbiased estimator for 
which the equality sign holds in (82) if and only if M is positive 
definite. This motivates the use of condition (A,’)in Theorem 3 
and the term “completely observable.” 

(7) It can be shown [17] that in the constant case complete 
observability is equivalent to the easily verified condition: 


rank{H’, F’H’, (F’ (33) 


where the square brackets denote a matrix with n rows and np 
columns. 

(9) Stability of the optimal filler. It should be realized now that 
the optimality of the filter (1) does not at the same time guarantee 
its stability. The reader can easily check this by constructing an 
example (for instance, one in which (10-11) consists of two non- 
interacting systems). To establish weak sufficient conditions 
for stability entails some rather delicate mathematical technicali- 
ties which we shall bypass and state only the best final result cur- 
rently available. 

First, some additional definitions. 

We say that the model (10-11) is uniformly completely ob- 
servable if there exist fixed constants, a, a2, and o such that 


BS os for all x* and 


Similarly, we say that a model is completely controllable {uni- 
formly completely controllable] if the dual model is completely ob- 
servable [uniformly completely observable]. For a discussion of 
these motions, the reader may refer to [17]. It should be noted 
that the property of “uniformity” is always true for constant 
systems. 

We can now state the central theorem of the paper: 


THEOREM 4. Assume that the model of the message process is 


uniformly completely observable; 

(As) uniformly completely controllable; 

(As) as a, as |IR(t)|| as forall ¢; 
(Az) 


Then the following is true: 


(i) The optimal filter is uniformly asymptotically stable; 

(ii) Every solution TI(t; Po, t) of the variance equation (IV) 
starting at a symmetric nonnegative matrix Po converges to P(t) 
(defined in Theorem 3) ast— ~. 


Remarks. (j) A filter which is not uniformly asymptotically 
stable may have an unbounded response to a bounded input [21]; 
the practical usefulness of such a filter is rather limited. 

(k) Property (ii) in Theorem 4 is of central importance since it 
shows that the variance equation is a “stable’’ computational 
method that may be expected to be rather insensitive to roundoff 
errors. 

(1) The speed of convergence of Po(t) to P(t) can be estimated 
quite effectively using the second method of Lyapunov; see [17]. 

(10) Solution of the classical Wiener problem. Theorems 3 and 4 
have the following immediate corollary: 

THEOREM 5. Assume the hypotheses of Theorems 3 and 4 
are satisfied and that F, G, H, Q, R, are constants. 

Then, if 6 = —~@, the solution of the estimation problem is ob- 
tained by setting the right-hand side of (IV) equal to zero and solving 
the resulting set of quadratic algebraic equations. That solution 
which is nonnegative definite is equal to P. 

To prove this, we observe that, by the assumption of con- 
stancy, P(t) is a constant. By Theorem 4, all solutions of (IV) 
starting at nonnegative matrices converge to P. Hence, if a 
matrix P is found for which the right-hand side of (IV) vanishes 
and if this matrix is nonnegative definite, it must be identical 
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with P. Note, however, that the procedure may fail if the con- 
ditions of Theorems 3 and 4 are not satisfied. See Example 4. 

(11) Solution of the Dual Problem. For details, consult [17]. 
The only facts needed here are the following: The optimal con- 
trol law is given by 


u*(t*) = —K*(t*)x(t*) (34) 
where K*(¢*) satisfies the duality relation 
K*(t*) = K“(t) (35) 


and is to be determined by duality from formula (III). The 
value of the performance index (20) may be written in the form 


min V(x*; ¢*, &*, u*) = x*, 
u* 


where II*(t*; x*, &*) is the solution of the dual of the variance 
equation (IV). 

It should be carefully noted that the hypotheses of Theorem 4 
are invariant under duality. Hence essentially the same theory 
covers both the estimation and the regular problem, as stated in 
Section 5. 

The vector-matrix block diagram for the optimal regulator is 
shown in Fig. 11. 


Fig. 11 General block diagram of optimal regulator 


(12) Computation of the covariance matrix for the message process. 
To apply Theorem 1, it is necessary to determine cov [x(t), x(t)]. 
This may be specified as part of the problem statement as in 
Example 5. On the other hand, one might assume that the mes- 
sage model has reached steady state (see (A;)), in which case from 
(13) and (12) we have that 


S(t) = cov [x(t), x(t)] = 7 G(r )Q(7r)G "(7 )@'(t, r)dr 


provided the model (10) is asymptotically stable. Differentiating 
this expression with respect to ¢t we obtain the following dif- 
ferential equation for S(t) 


dS/dt = F(t)S + SF(t) + (36) 


This formula is analogous to the well-known lemma of Lyapunov 
[21] in evaluating the integrated square of a solution of a linear 
differential equation. In case of a constant system, (36) reduces 
to a system of linear algebraic equations. 


8 Derivation of the Fundamental Equations 


We first deduce the matrix form of the familiar Wiener-Hopf 
integral equation. Differentiating it with respect to time and 
then using (10-11), we obtain in a very simple way the funda- 
mental equations of our theory. 

Much cumbersome manipulation of integrals can be avoided by 
recognizing, as has been pointed out by Pugachev [27], that the 
Wiener-Hopf equation is a special case of a simple geometric 
principle: orthogonal projection. 

Consider an abstract space X such that an inner product (X, Y) 
is defined between any two elements X, Y of X. The norm is 
defined by ||X\| = (X, X)'/*. Let U be a subspace of X. We 
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seek a vector Uy in U which minimizes |!X — U|| with respect 
toany Uin‘U. If such a minimizing vector exists, it may be 
characterized in the following way: 

ORTHOGONAL PROJECTION LEMMA. 
\\X — Us| for all U in U (i) if and (ii) only if 


(X — Uo, U) = 0 for all U in U (37) 


(iii) Moreover, if there is another vector U,' satisfying (37), then 
— Us'|| = 0. 
Proof. (i), (iii) Consider the identity 


Ul? = UJ? + 2X — Us, Ue — U) — 


Since ‘U is a linear space, it contains U — Uo; hence if Condition 
(37) holds, the middle term vanishes and therefore = 
\|X — Us|. Property (iii) is obvious. 

(ii) Suppose there is a vector U; such that (X — Uo, Ui) = @ 
#0. Then 


|X — Us — BU,|? = — Uj? + 2a8 + 


For a suitable choice of 8, the sum of the last two terms will be 
negative, contradicting the optimality of Uj. Q.E.D. 

Using this lemma, it is easy to show: 

WIENER-HOPF EQUATION. A necessary and sufficient 
condition for [x*, x(t:\t)] (where %(t)|t) is defined by (14)) to be a 
minimum variance estimator of [x*, x(t:)] for all x*, is that the 
matrix function A(t, T) satisfy the relation 


|x - U2 


cov{[x(t), 2(¢)) — ff. A‘t, 7) cov[z(r), = 0 (38) 
or equivaiently, 
cov [*(t)|t), 2(0)] = (39) 
forall Sa <t. 
COROLLARY.  cov[x(t:t), = 0 (40) 


Proof. Let x* be a fixed costate and denote by X the space of 
all scalar random variables [x*, x(t,)] of zero mean and finite 
variance. The inner product is defined as (X, Y) = &[x*, 
x(t:)}-[x*, y(t:)]. The subspace ‘U is the set of all scalar random 
variables of the type 


U = [x*, = [ Bits, r)a(r)dr | 


(where B(t,, 7) is an n X p matrix continuously differentiable in 
both arguments). We write U, for the estimate [x*, %(t;/t)]. 

We now apply the orthogonal projection lemma and find that 
condition (37) takes the form 


(X — Us, U) = & [x*, [x*, 
= x* cov[ x* 


Interchanging integration and the expected value operation 
(permissible in view of the continuity assumptions made under 
(Ai), see [28] ), we get 


(X — Uo, U) = x* cov[#(t!t), 2(0)]B’(h, 


This expression must vanish for all x*. Sufficiency of (39) is 
obvious. To prove the necessity, we take B(t;, 7) = cov[x(tit), 
z(¢)}. Then BB’ is nonnegative definite. By continuity, the 
integral will be positive for some x* unless BB’ and therefore also 
B(t:, vanishes identically for all  < <t. The Corollary 
follows trivially by multiplying (39) on the right by A’(t, 7) and 
integrating with respect to ¢. Q.E.D. 

Remark. (m) Equation (39) does not hold when o = t. In 
fact, cov x(t\t), 2 (t)] = K (t) R (0). 
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For the moment we assume for simplicity that 4; = ¢. Differen- 
tiating (38) with respect to ¢, and interchanging 0/d¢ and & we 
get for S <1, 


cov[x(t), 2(0)] = Fit) cov|x(t), 2(c)] 


+ G(t) coviu(t), 2(0)] (41) 
and 


t 
A(t, 7) cov[z(7), 2(0)]dr 


f A(t, 7) covly(r), y(o) |dr + A(t, o)R(c) 


t 
= f A(t, 7) cov[2(r), a(o |dr 


+ A(t, t) cov [y(t), (42) 


The last term in (41) vanishes because of the independence of 
u(t) of vio) and when o <t. Further, 


cov[y(t), y(o)] = H(t)cov[x(t), 2(¢)] — covly(t), (43) 


As before, the last term again vanishes. Combining (41-43), we 
get, bearing in mind also (38), 


re) 


— A(t, cov[z(r), 2(¢))dr = O (44) 


for all) S o@ <t. This condition is certainly satisfied if the opti- 
mal operator A(t, 7) is a solution of the differential equation 


7) = A(t, A(t, = 0 (45) 


for all values of the parameter 7 lying in the interval & S 7 & ¢. 

If R(r) is positive definite in this interval, then condition (45) is 
necessary. In fact, let B(t, 7) denote the bracketed term in (44). 
If A(t, 7) satisfies the Wiener-Hopf equation (38), then %(t\t) 
given by (14) is an optimal estimate; and the same holds also for 


x(t\t) + ff. 


since by (45) A(t, 7) + B(t, rT) also satisfies the Wiener-Hopf equa- 
tion. But by the lemma, the norm of the difference of two opti- 
mal estimates is zero. Hence 


x* Bit, r)cov[z(r), 2(7’))B(t, x*’ =0 (46) 


for all x*. By the assumptions of Section 4, y(7) and v(7) are 
uncorrelated and therefore 
cov[z(r), 2(7’)] = — 7’) + covly(r), y(7’)] 


Substituting this into the integral (46), the contribution of the 
second term on the right is nonnegative while the contribution of 
the first term is positive unless (45) holds (because of the positive 
definiteness of R(r)), which concludes the proof. 

Differentiating (14), with respect to ¢ we find 


t 
d(t\t)/dt = f 7 A(t, r)2(r)dr + A(t, 
lo 


Using the abbreviation A(t, 1) = K(t) as well as (45) and (14), 
we obtain at once the differential equation of the optimal filter: 


AR(t\t)/dt = + K(t)[x(t) — (1) 
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Combining (10) and (1), we obtain the differential equation for 
the error of the optimal estimate: 


(t\t)/dt = (F(t) — + G(t)u(t) — (11) 
To obtain an explicit expression for K(t), we observe first that 

(39) implies that following identity in the interval & < o < t: 
cov (x(t), Ate, 1) cov = Att, 


Since both sides of (39’) are continuous functions of ¢, it is clear 
that equality holds also for ¢ = t. Therefore 


K(OR(t) = = x(t), 
= cov 


By (40), we have then 
= cov [x(¢\t), = 


Since R(t) is assumed to be positive definite, it is invertible and 
therefore 


K(t) = (IIT) 


We can now derive the variance equation. Let W(t, +) be the 
common transition matrix of (I) and (II). Then 


P(t) — W(t, H(t, ta) 


Using the fact that u(t) and v(¢) are uncorrelated white noise, the 
integral simplifies to 


Differentiating with respect to ¢ and using (III), we obtain after 
easy calculations the variance equation 


dP /dt = F(t)P + PF(t) — 
+ (IV) 
Alternately, we could write 
dP/dt = d cov (x, x]/dt = cov [dx /dt, x] + cov [x, dx /dt] 


and evaluate the right-hand side by means of (II). A typical 
covariance matrix to be computed is 


cov [x(¢\t), u(t)] 
t 
= cov | — Kerv(r ler, w(t) | 


= (1/2)G()Q(t) 


the factor '/: following from properties of the 5-function. 
To complete the derivations, we note that, if t; > t, then by 
(3) 


t 

Since u(r) for 1 < rt S t is independent of x(r) in the interval 
tl ST St, it follows by (38) that the optimal estimator for the 
right-hand side above is 0. Hence 

= D(H, (4 1) (Vv) 
The same conclusion does not follow if t, < ¢ because of lack of 
independence between x(r) and u(r). 
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The only point remaining in the proof of Theorem 1 is to de- 
termine the initial conditions for (IV). From (38) it is clear that 


= 0 
Hence 
= = %(tolto)] 
= cov[x(to), 


In case of the conventional Wiener theory (see (A;)), the last term 
is evaluated by means of (36). 
This completes the proof of Theorem 1. 


§ Outline of Proofs 


Using the duality relations (16), all proofs can be reduced to 
those given for the regulator problem in [17]. 

(1) The fact that solutions of the variance equation exist for 
all ¢ = & is proved in [17, Theorem (6.4)], using the fact that the 
variance of x(t) must be finite in any finite interval [t, ¢). 

(2) Theorem 3 is proved by showing that there exists a particu- 
lar estimate of finite but not necessarily minimum variance. 
Under (A,’), this is proved in [17; Theorem (6.6)]. A trivial 
modification of this proof goes through also with assumption 
(A,). 

(3) Theorem 4 is proved in [17; Theorems (6.8), (6.10), (7.2)]. 
The stability of the optimal filter is proved by noting that the 
estimation error plays the role of a Lyapunov function. The 
stability of the variance equation is proved by exhibiting a 
Lyapunov function for P. This Lyapunov function in the 
simplest case is discussed briefly at the end of Example 1. While 
this theorem is true also in the nonconstant case, at present one 
must impose the somewhat restrictive conditions (Ae — Az). 


10 Analytic Solution of the Variance Equation 
Let X(t), W(t) be the (unique) matrix solution pair for (27) 
which satisfy the initial conditions 
X(t) = I, W(t) P, 
Then we have the following identity 
W(t) = P(t)X(t), (48) 


which is easily verified by substituting (48) with (IV) into (27). 
On the other hand, in view of (47-48), we see immediately from 
the first set of equations (27) that X(t) is the transition matrix 
of the differential equation 


dx/dt = —F'(t)x + 


which is the adjoint of the differential equation (IV) of the 
optimal filter. Since the inverse of a transition matrix always 
exists, we can write 


Pit) = WiOX-Kt), t 2b (49) 
This formula may not be valid for ¢ < t, for then P(é) may not 
exist! 
Only trivial steps remain to complete the proof of Theorem 2. 


11 Examples: Solution 


Ezample 1. If qu > 0 and ry; > 0, it is easily verified that the 
conditions of Theorems 3-4 are satisfied. After trivial sub- 
stitutions in (III-IV) we obtain the expression for the optimal 
gain 


(47) 


t2b 


= pu(t)/ru 
and the variance equation 


dpu/dt = — pu®/ra + Qu 
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By setting the right-hand side of (51) equal to zero, by virtue of 
the corollary of Theorem 4 we obtain the solution of the station- 
ary problem (i.e., = — ©, see (As;)): 


Pu = + V fu? + rn (52) 


Since py and ry, are nonnegative, it is clear that only the positive 
sign is permissible in front of the square root. 
Substituting into (50), we get the following expressions for the 
optimal gain 
ku = fu + V fin? + qu/tu (53) 


and for the infinitesimal transition matrix (i.e., reciprocal time 
constant) 


Ju = fu — = -V fu? + qQu/ru (54) 


of the optimal filter. We see, in accordance with Theorem 4, that 
the optimal filter is always stable, irrespective of the stability of 
the message model. Fig. 1(b) shows the configuration of the 
optimal filter. 

It is easily checked that the formulas (52-54) agree with the re- 
sults of the conventional Wiener theory [29]. 

Let us now compute the solution of the problem for a finite 
smoothing interval (4 > — ©). The Hamiltonian equations 
(27) in this case are: 

dz,/dt = + 
dwi/dt = qua + fuwi 


Let T be the matrix of coefficients of these equations. 

To compute the transition matrix @(t, &) corresponding to T, 
we note first that the eigenvalues of T are +fu. Using this fact 
and constancy, it follows that 


to.) = exp T(t — t&) = C, exp (t — 
+ exp [—(t — t)ful 


where the constant matrices C, and C; are uniquely determined by 
the requirements 


O(b, t) = C, + CG = 1 = unit matrix 
dO(t, to)/dt| mtg = TO(t, to)! = — Fue 
After a good deal of algebra, we obtain 
cosh fut — Fa sinh fit 
il 
qu 


hu sinh fur 


sinh Fur 


+ 7, = 


ji 


Knowledge of @(t, t) can be used to derive explicit solutions 
to a variety of nonstationary filtering problems. 

We consider only one such problem, which was treated by Shin- 
brot (3, Example 2]. He assumes that f;, < 0 and that the mes- 
sage process has reached steady-state. From (36) we see that 


= —qu/2fu for all t 


We assume that the observations of the signal start at ¢ = 0. 
Since the estimates must be unbiased, it is clear that #,(0) = 0. 
Therefore 


pulO) = 62,0) = 62,40) = —gu/2fu 
substituting this into (55), we get Shinbrot’s formula: 

(fu — — (fu + ] 
—(fu — + (fu + 
Since fi, < 0, we see that ast ©, p(t) converges to 

Pu = —Gu/(fu + fu) = (fu — Jaden 
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pult) = 


which agrees with (52). 
To understand better the factors affecting convergence to the 
steady-state, let 


Spu(t) = pult) — Pu 
The differential equation for 5px: is 
dbp /dt = Budpu — (dpu)*/ru 
We now introduce a Lyapunov function [21] for (56) 


Vidpu) = (dpu/Ppu)? 
The derivative of V along motions of (51) is given by 


= —2[pu/ru + 9u/Pul V(dpu) (57) 


This shows clearly that the ‘‘equivalent reciprocal time constant”’ 
for the variance equation depends on two quantities: (i) the 
message-to-noise ratio pu/ry at the input of the optimal filter, 
(ii) the ratio of excitation to estimation error gu/pu. 

Since the message model in this example is identical with its 
dual, it is clear that the preceding results apply without any modi- 
fication to the dual problem. In particular, the filter shown in 
Fig. 1(b) is the same as the optimal regulator for a plant with 
transfer function 1/(s — fu). The Hamiltonian equations (27) 
for the dual problem were derived by Rozonoér [19] from Pon- 
tryagin’s maximum principle. 

Let us conclude this example by making some observations 
about the nonconstant case. First, the expression for the 
derivative of the Lyapunov function given by (57) remains true 
without any modification. Second, assume fu(t) has been evalu- 
ated somehow. Given this number, fu‘t) can be evaluated. for 
t = t by means of the variance equation (51); the existence of a 
Lyapunov function and in particular (57) shows that this compu- 
tation is stable, i.e., not adversely affected by roundoff errors. 
Third, knowing f(t), equation (57) provides a clear picture of 
the transient behavior of the optimal filter, even though it might 
be impossible to solve (51) in closed form. 

Example 2. The variance equation is 


dpu/dt = — pu 1/ru + 1/12) + qu 


(55) 


cosh fut + Ju sinh Sur 


If gu > 0, ru > 0, and re > 0, the conditions of Theorems 3-4 
are satisfied. Therefore the minimum error variance in the 
steady-state is 


fu + V fu? + qu/tu + 
+ 1/ree 


Pu 


and the optimal steady-state gains are 


ky = Pu/Tiis i= a, 2 


The same problem has been considered also by Westcott [30, 
Example]. A glance at his calculations shows that ours is the 
simpler and more natural approach. 

Ezample 8. The variance equation is 


dpy/dt = —pi®/ru + qu 
dp/dt = pu — Pir 
dpu/dt = Apis — pu) — 


(58) 
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If gu > 0, ru > 0, the conditions of Theorems 3-4 are satisfied. 
Setting the right-hand side of (58) equal to zero, we get the solu- 
tion of the stationary problem: 


ku V 


ku = -14+ V1 4+ 


See Fig. 3(b). 
The infinitesimal transition matrix of the optimal filter in the 


steady-state is: 
V 


The natural frequency of the filter is (qu/ri)'/* and the damping 
ratio is (*/2)[2 + (ru/qu)'/*]'“. Even for such a very simple 
problem, the parameters of the optimal filter are not at all obvious 
by inspection. 

The solution of the dual problem in the steady-state (see Fig. 4) 
is obtained by utilizing the duality relations 


The same result was obtained by Kipiniak [24], using the Euler 
equations of the calculus of variations. 
Ezample 4. The variance equation is 
dpu/dt = — pu?/ru + qu 
dpi2/dt = + — (59) 
dpn/dt = — pr*/ru 
If fis * 0, fr # 0, and ry, > 0, the conditions of Theorems 3-4 
are satisfied. There are then two sets of possibilities for the right- 
hand side of (59) to vanish for nonnegative p22: 


Pu = V (B) pu 
Piz 0 Pu = 


Pia —(fer/fiz) V ours = 


The expression for fz: shows that Case (A) applies when fifa is 
negative (the model is stable but not asymptotically stable) 
and Case (B) applies when fi2f2; is positive (the model is unstable). 

The optimal filter is shown in Fig. 5(b). The optimal gains are 
given by 


(A) 
2farn 


ky, = Pu/tu, kn = 


If fiz = 0 but fe, = 0, the model is completely observable but 
not completely controllable. Hence the steady-state variances 
exist but the optimal filter is not necessarily asymptotically stable 
since Theorem 4 is not applicable. As a matter of fact, the 
optimal filter in this case is partially “open loop’ and it is not 
asymptotically stable. 

If fie = 0, then not even Theorem 3 is applicable. In this 
case, if fo, ~ 0, equations (59) have no equilibrium state; if fe 
= 0, then equations (59) have an infinity of positive definite 
equilibrium states given by: 


Pu = Van /ru, Pu = 0, Pu > 0 


Thus if fy = 0, the conclusions of Theorems 3-4 are false. 
Ezample 5. The variance equation is 


dpy,/dt = — pu?/ru 
dp;:/dt = 
dp»/dt = 


We assume that r;, > 0; this assures that Theorem 3 is applica- 
ble. We then find that the steady-state error variances are all 
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zero. 
(27) is: 


The matrix of coefficients of the Hamiltonian equations 


0 0 0 
-1 0 0 O 
0 1 
00 0 0 
and the corresponding transition matrix is (here (4) is a finite 
series! ) 
1 
—t 
0 1 T 
0 0 1 


O(to + T; to) 


Using (29), we find (to = 0): 


This formula, obtained here with little labor, is identical with the 
results of Shinbort [3, Example 1]. 

The optimal filter is shown in Fig. 7(b). The time-varying 
gains tend to 0 ast—> ~; in other words, the filter pays less and 
less attention to the incoming signals and relies more and more on 
the previous estimates of x, and 2». 

Since the conditions of Theorem 4 are not satisfied, one might 
suspect that the optimal filter is not uniformly (and hence ex- 
ponentially [21]) asymptotically stable. To check this conjec- 
ture, we calculate the transition matrix of the optimal filter. We 
find, for t, 7 2 0, 


1 


W(t, rT) = att) [ 


a(t) — B(t, r)t 


—B(t, 


— a(t)r + a(r)t + Bit, 
a(r) + B(t, r) 


= V (Gu + 4fiefarudru 


V (au + 


where 
a(t) = 8/3 + ru/px(0) 


A(t, = — 


Since Wu(t 7) does not converge to zero with t — rT — ©, it is 
clear that the optimal filter is not even stable, let alone asymp- 
totically stable. 

From the transition matrix of the optimal filter, we can obtain 
at once its impulse response with respect to the input z(t) and 
output £,(2): 

tr 
t, T)k t, 
Wult, + 1/3 + 
This agrees with Shinbrot’s result [3]. 
Example 6. The variance equation is: 


dp, /dt = 2px — pu?/rn har? ree 


dpi2/dt = hu? pupie/tn (60) 


dpe/dt = —hy*pi2?/ris — + qu 


If hy + 0, qu > 0, ru > 0, ree > O, then the conditions of 
Theorems 3-4 are satisfied. Setting the right-hand side of (60) 
equal to zero leads to a very complicated algebraic problem. We 
introduce first the abbreviations: 


as V 
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It follows that 


huku 
hukes = 


heakan 


It is easy to verify that the right-hand side of (60) vanishes for 
this set of p,;’s; by Theorem 5, this cannot happen for any other 
set. Hence the solution of the stationary Wiener problem is com- 
plete. It is interesting to note that the conventional procedure 
would require here the spectral factorization of a two-by-two 
matrix which is very much more difficult algebraically than by 
the present method. 


The infinitesimal transition matrix of the optimal filter is given 
by 


a 


a + 


, V20 + B 
a+ 


The natural frequency of the optimal filter is 


w = |A(Fon)| = Va 


and the damping ratio is 


-B 


1 B? 
V2 t 2a 


The quantities a and 6 can be regarded as signal-to-noise ratios. 
Since all parameters of the optimal filter depend only on these 
ratios, there is a possibility of building an adaptive tilter once 
means of experimentally measuring a and 6 are available. An in- 
vestigation of this sort was carried out by Bucy [31] in the simpli- 
fied case when hy = 8 = 0. 


= |Re A(Fopt)! /w = 


12 Problems Related to Adaptive Systems 


The generality of our results should be of considerable useful- 
ness in the theory of adaptive systems, which is as yet in a primi- 
tive stage of development. 

An adaptive system is one which changes its parameters in ac- 


cordance with measured changes in its environment. In the 
estimation problem, the changing environment is reflected in the 
time-dependence of F, G, H, Q, R. Our theory shows that such 
changes affect only the values of the parameters but not the 
structure of the optimal filter. This is what one would expect 
intuitively and we now have also a rigorous proof. Under ideal 
circumstances, the changes in the environment could be detected 
instantaneously and exactly. The adaptive filter would then 
behave as required by the fundamental equations (I-1V). In 
other words, our theory establishes a basis of comparison between 
actual and ideal adaptive behavior. It is clear therefore that 
a fundamental problem in the theory of adaptive systems is the 
further study of properties of the variance equation (1V). 


13 Conclusions 


One should clearly distinguish between two aspects of the esti- 
mation problem: 


Journal of Basic Engineering 


(1) The theoretical aspect. Here interest centers on: 


(i) The general form of the solution (see Fig. 1). 

(ii) Conditions which guarantee a priori the existence, physical 
realizability, and stability of the optimal filter. 

(iii) Characterization of the general results in terms of some 
simple quantities, such as signal-to-noise ratio, information rate, 
bandwidth, ete. 


An important consequence of the time-domain approach is that 
these considerations can be completely divorced from the as- 
sumption of stationarity which has dominated much of the think- 
ing in the past. 

(2) The computational aspect. The classical (more accurately, 
old-fashioned) view is that a mathematical problem is solved if 
the solution is expressed by a formula. It is not a trivial matter, 
however, to substitute numbers in a formula. The current litera- 
ture on the Wiener problem is full of semirigorously derived 
formulas which turn out to be unusable for practical computa- 
tion when the order of the system becomes even moderately large. 
The variance equation of ovr approach provides a practically 
useful and theoretically “clean” technique of numerical computa- 
tion. Because of the guaranteed convergence of these equations, 
the computational problem can be considered solved, except for 
purely numerical difficulties. 

Some open problems, which we intend to treat in the near 
future, are: 


(i) Extension of the theory to include nonwhite noise. As 
mentioned in Section 2, this problem is already solved in the dis- 
crete-time case [11], and the only remaining difficulty is to get a 
convenient canonical form in the continuous-time case. 


(ii) General study of the variance equations using Lyapunov 
functions. 


(iii) Relations with the calculus of variations and information 
theory. 
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Approximate Method for Calculating the 
Time Response in Linear, Time-Varying, and 
Nonlinear Automatic Control Systems 


In past years many methods have been developed for calculating the time response in 
automatic control systems. In this paper an improved, approximate method for calculat- 
ing the transient response in linear (Section 1), time-varying (Section 2), and non- 
linear (Section 3) automatic control systems, is developed on the basis of previous works 
[1-5].2 The basis of this method lies in an approximate solution of integral equations 
by means of special tables which are given in the Appendix. These tables enable the 
user to shorten the time for calculation. This method also can be useful for obtaining 
programs for digital computers. In each part of the paper, examples are given to 
illustrate the use of this method. These examples are identical to those of Boxer and 
Thaler [4] and facilitate a comparison of the solution method developed in the paper 
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with their z-transform approach. 


1 Linear, Constant-Coefficient Automatic Control Systems 


A GENERAL idea of the method may be gained by 
supposing that we have a linear automatic control system as 
shown in Fig. 1, with input signal f(t) for which we wish to obtain 
the time response. The transfer function for this closed system is 

K(s) E(s) 


Ketosea(8) = i+ K@ = P(s) 


bos™ + + + b,, 


K, = 
(8) +... . + + G, 


where K(s) is the transfer function for the open-loop system; 
E(s) and F(s) are Laplace transforms for e(t) and f(t), respec- 
tively. 

} In the future we will suppose that 


b,, = @ =a, = 1 


as it will be more convenient for calculations to follow. Division 
of numerator and denominator by s* and rearranging makes it 
possible to write the transfer function of equation (2) as 


E(s) = + +... + F(s) 
— + +... + + 8] E(s) (3) 


To this equation will correspond the integral equation 


e(t) = i K'(t — -f(r)dr — h'(t — r)e(r)dr, (4) 
where 
K'(t) = L-1{ bos + bs! + + 


1 This research was conducted at the Department of Electrical En- 
gineering, Massachusetts Institute of Technology, March-June, 
1959, while the author was a guest of the Institute. ° 

? Numbers in brackets designate References at end of paper. 

Contributed by the Instruments and Regulators Division of Tue 
AMERICAN Society oF MecuanicaL ENGINEERS and presented at 
the Joint Automatic Control Conference, Cambridge, Mass., Sep- 
tember 7-9, 1960. Manuscript received at ASME Headquarters, 
June 6, 1960. ASME Paper No. 60—JAC-10. 


Journal of Basic Engineering 


r 


K'() = +b 


and 


h'(t) = + +... + + 8} 


n—2 


From these expressions we see that the function K’(t) and h’(t) 
are sums of the power functions t‘/i/. 

The integral equation (4) will be solved by a recursion formula 
that is obtained by replacing the integrals by summations. The 
value of e(t) will be found at discrete times n7’ where T' is a 
suitably chosen time interval and n is an integer. The only ap- 
proximation involved is in the replacement of the integrals by 
summations, and the error so introduced is a function of the in- 
terval T. By making 7 sufficiently small the error can be 
limited to any desired value. 

In deriving the desired recursion formula for e(t) it is convenient 
to represent the first integral in equation (4) by (i); that is, 


fi K'(t — rif(r)dr (7) 


Replacing the integration by a summation yields 


¢, = + Ki'fait...+ + 


(8) 


where the subscript n denotes the value of a function at t = n7’; 
for example, 


on * 


Now the recursion formula for eg = e(n7’) is given. This 
formula is obtained by replacing the second integral in equation 
(4) by a summation in the manner just illustrated for @, in 
equation (8). Upon rearranging terms so that all those involving 
e, are on the left, the following recursion formula for e, results: 
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| 
' 
or 
E(s) 


K(s) 


Fig. 2 Block diagram corresponding to equations (4), (5), and (6) 


h 
+ + «| (9) 


Using equation (8) the values @, are first calculated and then 
with these available, the values of e, are calculated using equa- 
tion (9). The calculations for the values @, are conveniently 
made using the tables of the power function as given in the Ap- 
pendix. We also use these tables for calculating the values h,,’ 
= h’' (nT). In this way we can rapidly compute the discrete 
values é,,. 

It is interesting to note that equations (4), (5), and (6) corre- 
spond to the block diagram in Fig. 2. This block diagram is a 
transformation of the previous block diagram of Fig. 1. 

Example 1. Let us illustrate the foregoing approximate method 
by means of an example. Let us calculate the time response of a 
second-order system whose differential equation has no damping 
coefficient for f(t) = 1(t) (step input signal): 


d*z 
+2 = l(t) (10) 
The block diagram corresponding to this equation is shown in Fig. 
3. For zero initial conditions the Laplace transform for equation 
(10) is 


* These tables were calculated at Massachusetts Institute of Tech- 
nology using the IBM 704 computer. 
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if 


1 x 
s*+ 1 


Fig. 3 Block diagram corresponding to equation (10) 
t $e 


Fig. 4 Block diagram corresponding to equation (12) 


+X) (11) 

or after transformation we will have: 
X@) =— — X@) (12) 

8 


Fig. 4 shows a block diagram corresponding to this equation. In 
the time domain equation (12) becomes 


t 
z(t) = df(t) h'(t — r)-x(r)dr (13) 
where 
14 
(14) 
=t (15) 
The approximate equation corresponding to equation (13) is 
z, = d,, + soe + (16) 
where 
n=1,2,3,...; T, = Th'(nT) = nT’; 
n*T? 
>, = - 


The hand calculation of the z, can be accomplished easily by 
forming a convenient table (see Table 1). 

In Table 2 are given the exact and approximate results for 
z(t). After comparing the exact and approximate values for x(t), 
we see that this method gives high accuracy over the time range 
investigated. 


2 Linear, Time-Varying Automatic Control Systems 


Class of Control Systems Considered. The time-varying control 
systems to which the approximate methods of this paper for 
transient-response calculation apply can be described by three 
equations. The plant is characterized by a time-varying co- 
efficient equation of the form: 
diy da Dy 


—- + A(t) 


dy 
di +... + Ay-n@ dt 


t 
+A, y =e(t) (17) 


where y is the output and e is the error signal. The regulator is 
described by a constant-coefficient equation of the type 


d"z 


where n > m. The error signal, which is the difference between 
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t 

Fig. 1 Basic block of control system 

e 


the set point (or input) f and the output of the regulator, is given and equation (18) has to be modified so that ao = a, = 6,, = 1.0. 


by After these equations have been placed in proper form a procedure 
e(t) = f(t) — x() (19) similar to that demonstrated for the linear constant-coefficient 
systems in Section 1 can be applied. 
Fig. 5 is a block diagram of the control system that is being con- Solution Procedure. The method for solving time-varying con- 
sidered. 


trol systems can be presented most clearly by restricting the dis- 
In order to apply the method of this paper, equation (17) has cussion to plants characterized by a first-order equation with a 
to be rearranged into the form single time-varying coefficient. Specifically, the more general 


diy (Oy) equation (17) or (20) is replaced by 
ati dti-» d 
y 
— + A(y = elt (21) 
+ By =e@ (20) 
| PLANT Equation (18) in modified form becomes 
(TIME-VARYING 
DIFFERENTIAL n - 
dt" dt™ 
e 
REGULATOR 
(DIFFERENTIAL EQua- — with n > m. 
x TIONS WITH CONSTANT —~ 
COEFFICIENTS) Let us define 
Fig. 5 Block diagram of control system Z(t) = AW (23) 
Table 1 Approximate solution of time response in linear second order system (T = 0.5 sec) 
0.25 0.50 0.75 1.00 1.25 1.50] 1.75 2.00} 2.25 2.50 2.75 3.00 3.25 
1 [0.125 | 0.0312 | 0.0625 | 0.0937 9.125 | 0.1562 | 0.1875 0.2187 | 0.2 0.2612 0.3125 | 0.3437 | 0.3750 0.4062 
0.5000 
2 | 0.4688 0.1172] 0.2344 0.3516| 0.4686 0.7032 | 0.620% | 0.9376 1.0548 | 1.1720 | 1.2892 1. 4064 
1.1250 
2 3 | 0.985) orl 0.2363 o.4r26| 0.7090 | 0.9653 | 1.1816 | 1.4179 | 1.6563 | 1.6906 | 2.1269 | 2.3632 | 2.5906 
2.0000 
-0. 
& | 1.4356 2.3 0.3569} 0.7178 | 1.0767 | 1.4356 | 1.7965 | 2.1534 | 275223 | 2.6712 | 3.2301 | 3.5890 
3.1250 
5 | 1-8169 ERE ©.45k2 | 0.9088 | 1.3627 | 1.6169 | 2.2711 | 2.7253 | 3.1796 | 3.6338 | 4.088 
4. 5000 
6 | 1.99% I? 0.4985 | 0.9970 | 1.4955 | 1.996 | 2.4225 | 2.9910 | 3.4895 | 3.9880 
| 6.129 | 
202% 88 5 
7 | 1.9226 | Tas 0.4806 | 0.9613 | 1.4619 | 1.9226 | 2.4032 | 2.8839 | 3.36% 
8 | 1.6206 
1 | | -6. 0.4051 0.8103 1.2154 1.€206 2.0257 2.4309 
| 
10.1250 
| 1.1634 |-8.9616 | 0.2908 0.5817 | 0.8725 | 1.1634 1.4542 
1-1 
10 | 0.6654 | a base | 0.266% | 0.3327 | 0.4991 | 0.666% 
: 1 | 0.2509 14.8742 | 0.0627 | 0.125% | 0.1880 
| 0.2509 | 
| | [17.9762 0.0080 0.0119 
0.0102 
13 | 0.0407 | | 
| 0.208 | | | Lok 


Table 2 Exact and approximate values for time response in second order linear system (T = 0.5 sec) 


t 
(sec) 0 0.* 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7-0 


| 0.1224 | 0.4597 | 0.9293 | 1.4161] 1.8011 | 1.9900 | 1.9365 | 1.6536/ 1.2108] 0.7163] 0.2913] 0.0398] 0.0261) 0.2661 


0.1250 | 0.4688 | 0.9453 | 1.4356 | 1.8169 | 1.9940 | 1.9226 | 1.6206] 1. 


Journal of Basic Engineering MARCH 1961 / 111 


| 


Table 3 Calculation table for discrete valves of y,, in an automatic control system with time-varying 


parameters 
z 
| 
- 
2 
rz 
3 %3 AS a, ¥3 Zo, 2, 
#,- 4, 
* %, %y Ay Los 
25° 
Were: 


and obtain for equations (19), (21), and (22) the Laplace trans- 
forms for zero initial conditions: 


E(s) = F(s) — z(s) (24) 
S-Y(s) + Z(s) = E(s) (25) 

{s* + +... + 1) 2(s) 
= [bos™ + +... + 1)¥(s) (26) 


where E(s), F(s), X(s), Y(s) are the Laplace transforms for e(t), 
S(0, 2, y(O, respectively. After solving the system of the equa- 
tions (24), (25), and (26) for ¥(s) we will have the following 
equation: 


bos™ + +...4+1 
s-Y¥(s) + Z(s) = F(s) (s) (27) 
or after transformation we will have 
1 1 1 
vo 5] 
1 1 
[ (Ga+i—m + bo) 
8 8 


n+l 


1 1 
+...+ (1 — + 
8 


8 8 bd 


or: 


y(t) = o(t) K'(t — r)-y(r)dr — h'(t — r)-Z(r)dr 


(29) 
where 
1 1 1 
= 
=l+at+ +— (30) 
n! 
1 1 
(31) 
n! 


112 / maRCH 1961 


s” gn ti 

(n — m)! 


+ (1 + Baa) + (32) 


Equation (29) was obtained for zero initial conditions. For non- 
zero initial conditions the form of the integral equations will be the 
same and only ¢(¢) will change conditions. The Laplace trans- 
form for nonzero initial conditions will be 


L{ di} = L(G} + L{ do} 


where ¢o(¢) = the equivalent input signal, which corresponds to 
the nonzero initial conditions. For the zero-initial conditions 
¢o(t) =0. The integral equation (29) corresponds to the follow- 
ing approximate formulas for calculating the discrete values y,,: 


(33) 


+ eee + (34) 
where 
n = 1,2,3,... 
Ko Ty aq 
= 1 — — — om 
a +- 2 + 2 1+T 
= y(nT); K, = TK,’; = Th,’ = K'(nT); (35) 


h,’ = h'(nT); A, = A(nT) 


To calculate the values of y, one must: 


1 Calculate the discrete values A,, I’,, and K,, for the fixed 
value of 7. For these calculations it is more convenient to use a 
table of the power functions (see Appendix). 

2 Calculate the discrete values of the equivalent input signal 
o(t), where 


>, = the | 
n= 1,2,3,... (36) 
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| 

| 

be 

4 


3 Combining the results of 2 and 1 calculate y, using equa- dy 
tion (34). dt + ty =1, t>0 (37) 


The calculations can be arranged in tabular form to facilitate 
the summing operations. Examples of tables prepared for this 
problem are shown as Tables 3 and 4. 


with the zero-initial condition y(0) = 0. The exact solution for 
equation (37) is 


It is noted that in the more general case where there is more ; yt) =1-—e-*/? (38) 
than one time-variable parameter the value of each of these First define 
parameters must be calculated at the discrete times. This is t-y() = Z(t) (39) 


accomplished simply by preparing a table for each time-variable 
parameter similar to Table 3. It is interesting to observe that an i aeacata e Laplace transf eq 


increase in the order of the differential equation with respect to y 1 
does not require any alteration of procedure. e¥(s) + Z() 3? (40) 
Example 2. Consider the approximate solution of the following er 
time-varying differential equation: 1 Z(s) 
(41) 
8 8 
Table 4 Auxiliary calculation table for discrete values of From this equation there follows the integral equation 
e yn in an automatic control system with time-varying parame- 
1 e 
h'() = {2 =1 (44) 
sf ae oz, | 2% | 2% 5% The approximation formula corresponding to equation (42) is 
then 
=| 
73 ay, = >, — +... + forn = 1, 2,3,... 
3 (45) 
y n 
5 
nT? 
a, =1+— 
2 
The table used for calculating the first 8 values of y, is given 
Were: - as Table 5 for T = 0.4. Table 6 compares the results of the ap- 
proximate solutions for 7 = 0.4 and 7 = 0.8 with the exact 
solution. 
Table 5 Calculation table for values of y, (n = 1, 2, 3,...) in Example 2 for a system with time-varying 
parameters (T = 0.4 sec) 
n ay a y A r, rs r, 
7 1 1.08 0.07% 0.4 0.030 0.012 0.012 0.012 0.0l2 0.012 0.012 | 0.012 
0.320 | 
2 | 0.308 1.16 0.265 0.6 | 0.212 | -0.012 | 0.085 0.085 | 0.08 0.085 0.085 | 0208 
0-308 
1 0.720 
3 | 0.623 | 1.26 | 2.2 | 0.602 0.240 | 0.240 | | oveko | 0.2h0 
.280 
a 0.943 1.32 0.713 1.6 1.141 233 0.456 0.456 0.456 | 0.456 
| 2.000 
| > 1.207 | 1.40 | 0.660] 2.0 | 1.72 | 0.688 0.436 | 0.456 
2.880 if 
6 1.399 1.48 0.945 2.4 2.268 -1.461 0.907 | 0.907 
1.532 1.56 | 0.983] 2.8 | 2.752 Lt 1.202 
5.120 
8 | 1.631 1.64 0.99% -3. 
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Table 6 Exact and approximate valves for y(t) of Example 2 for system 
with time-varying parameters 


t 
(sec) 0} 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 


y(t) 0 
(exact 
solution ) 


0.997 


y(t) 
(approx. 
} solution 
T=0.4) 


0.945 0.995 


| 0-4) 

| y(t) Zz 


3 Nonlinear Automatic Control Systems 


In references [2, 3, 4] are outlined methods which can be used 
to obtain approximations to the transient response of nonlinear 
automatic control systems. In each case the response is deter- 
mined at discrete points in time. Some of these methods require 
for computation of the discrete points the solution of a set of non- 
linear algebraic equations in the dependent variable. Also, some 
of these methods require that the impulse response characteristic 
of the open-loop system be computed for use in the over-all solu- 
tion. 

In this section it will be shown how the method set forth in the 
first part of this paper can be used to obtain approximations to 
the transient response of a nonlinear automatic control system. 
This will be accomplished by means of an example. 

Example 3. Let the nonlinear system be described by the dif- 
ferential equation 


d 


(46) 


with zero-initial conditions. The exact solution of this equation 
is 


y(t) = tanh ¢ (47) 
First define 
= (48) 
and obtain the Laplace transform of equation (46) as 
s¥(s) + Z(s) = (49) 
or 
1 1 
8 8 
The integral equation corresponding to equation (50) is 
t 
where 
= [=] (52) 
8 
1 
h'(t— 7) = £7 [+] = (53) 


Using the summation approximation of equation (8) to the in- 
tegral of equation (51) there results 


= T + hy" Yn—1? + ha’ Yn-2? + eee + | 
n = 1, 2, 3, ete. 


(54) 


114 / marRcH 1961 


where 
= y(nT) 
h,’ = 1 
>, = nT 
To solve equation (54) for y, requires the solution of a second- 
order algebraic equation in y,. 
To avoid solving this nonlinear equation for each value of y,, the 


following artifice is introduced: Divide each side of equation (50) 
by s to obtain 


The integral equation corresponding to this equation is 
t 
K(t — r)y(r)dr = f, h’'(t — r)Z(r)dr (56) 
where 
1 
K(t) = £7 [+ | = 1 (57) 
o(r) = 2/2 (58) 
=t (59) 
The approximate formula for calculating y,, is then 
Un + + eee + | 
= + +... forn = 1,2,3 (60) 
where 
T, = TK(nT) = T 
h, = Th'(nT) = nT? 
yielding 
2 = — + +... + 
— + hey,2? +... + (61) 


Thus y,, is given by linear equation (61) in y, and is solvable 
directly. Example tables prepared for the repetitive solution 
of this equation are given as Tables 7 and 8. 
the exact solution values is shown in Table 9. 


Comparison with 


A method has been presented which makes possible an orderly 
calculation of the transient response of linear, time-varying, and 
nonlinear systems. The method and the accuracy of its ap- 
proximation have been demonstrated by three example problems. 
The method is especially suited to hand calculations. Tables 
are given which aid greatly the hand computations. 
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Table 7 Calculations for time-response of nonlinear system of Example 3 for (T = 0.2 sec) 


a Yn r; ry rs [% Ts 
1 |0.02 0.2 


©. 1800 0.108 
3 0.0541 0.541 0.22% 
0.3200 


§ 


5 [0.0763 0.763 


ojoo 


6 |0.0834 | 0.63% 0.1668 
0.9800 
0.1 
7 | 0.0886 0.886 “9. 1s 
1.2600 
8 |0.0919 | 0.919 -1.1861 
0.0988 | 
1.620 
9 |0.09k8 | 0.948 | 
10 |0.0959 | 0.959 | 
-l. 1 
| | 
Table 8 Calculations for time-response of nonlinear system of Example 3 (T = 0.2 sec) 
0.0% 0.08 0.12 0.16 0.20 0.2% 0.28 0.32 0.36 
F. 1] 0.2 0.04 0.0016 | 0.0032 | 0.0048 0.0064 | 0.0080 | 0.0096 | 0.0112 | 0.0128 | 0.0144 
0.0016 
2} 0.384 | 0.1675 | 9-0059 | 0-0118 | 0.0177| 0.0236 | 0.0295 | 0.0354 | 0.0413 | 0.0672 
0.0091 
3| 0.541 | 0.2927 +0.1168 | 0.0117 | 0.0234| 0.0351 | 0.0668 | 0.0585 | 0.0702 | 0.0820 
78 089 1068 
0.667 | +0. 0.01 0.0356 | 0.0534 | 0.0712 | 0. 
| 
0.0653 
5| 0.763 0.5814 “ope 0.0233 | 0.0465 | 0.0698 | 0.093 0.1163 
| 
6| 0.834 | 0.6956 a 78 56 08 
. . +0.5110 | 0.0278 | 0.05) 0.0835 | 0.1213 | 
| 
7| 0. 0. +0. 0.0314 | 0.0628 | 0.0942 | 
0.929 | 0.8446 | | 0.0338 | 0.0676 | 
4 + {|__| | 
| 0. 
9) 0.948 0.8987 | | +1.0 0.0359 | 
| | | | | 
| 0.6757 
¢ 20! 0.959 | |+2.2284 
References 
Table 9 Comparison of exact and approximate responses of nonlinear 1 A. A. Krasovskij and G. 8. Pospelov, “Einige Methoden zur 
system of Example 3 Berechnung angenaeherter Zeitcharacteristiken bei linearen Systemen 
_ automatischer Regelung,”’ Avtomatika i Telemekhanika, vol. 13, 1953. 
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10 0.9640 0.959 378-389. 
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APPENDIX 


Power functions for approximate solutions, transient responses in linear systems with constant parameters, 
time-varying parameters, and nonlinear systems 


= nx for K = 1, 2,3, 4, 5,6 
K! 
t K=2 K=3 K=5 
‘ n m n m n m n m n m 
0.02 0.500000 4 0.166667 6 0.416667 9 0.633333 12 0.138889 14 
4 0.02 0.200000 3 0.133333 5 0.666667 8 . 206667 10 0.888889 13 
J 0.03 0.450000 3 0.450000 5 0.337500 7 0.202500 9 0.101250 11 
0.04 0.800000 3 0.106667 0.106667 6 0.853333 a 0.568389 | 
| 0.05 0.125000 2 0.208333 4 0.260417 6 0.260417 8 0.217014 10 
| 0.06 0.180000 2 0.360000 4 0.540000 6 0.648000 8 0.648000 10 
4 0.07 0.245000 2 0.571667 4 0.100042 5 0.140058 7 04163402 9 
q 0.08 0.320000 2 0.853333 . 0.170667 5 0.273067 7 0.364089 9 » 
0.09 0.405000 2 0.121500 3 0.273375 5 0A 92075 7 0.732112 9 
i 0.10 0.500000 2 0.166667 3 0.416667 5 0.833333 7 0.138089 8 
| 0-11 0.605000 2 0.221833 3 0.610042 5 0.134209 6 0.245050 8 
0.12 0. 720000 2 0.288000 3 0.864000 5 0.207360 6 0.814720 8 » 
0.13 0.845000 2 0.366167 3 0.119004 4 0.309411 6 0.670390 6 
1 0.14 0. 980000 2 0.457335 3 0.160067 4 0.448187 6 0.104577 7 
4 0.15 0.112500 1 0.562500 3 0.210937 4 0.632612 6 0.158203 7 
0.16 0.12800 1 0.682667 3 0.273067 0.875815 6 0.233017 7 
uJ 0.17 0.144500 1 0.818833 3 0. 348004 4 0.118321 5 0.335244 7 
I 0.18 0.162000 1 0.972000 3 0.477400 4 0.157464 5 0.472392 7 
0.19 00180500 1 0.114317 2 0.543004 0.206341 5 0.653415 7 
0.20 0. 200000 1 0.133333 2 0. 666667 . 0.266667 5 0.888889 7 
0.23 0.220500 1 0.154350 2 0.810337 0.340342 0.119120 6 
0.22 0.242000 2 0.177467 2 0.976066 ry 0.429869 5 0.157472 6 
y 0.23 0. 264500 i 0.202783 2 0.116600 3 0.536362 5 0.205605 6 
0.24 0. 288000 1 0.230400 2 0.138240 3 0.663552 5 0.265421 6 
0.25 0.312500 1 0.260417 2 0.162760 3 0.81380< 5 0.339084 6 
0.26 0.338000 1 0.292933 2 0.190407 3 0.990114 5 0.429050 6 
0.27 0.364500 1 0.326050 2 0.221834 3 +119574 4 0.538084 6 
0.28 0.392000 1 0.365867 2 0.256106 3 0.143420 4 0.669292 6 
0.29 0. 420500 1 0.406483 z 0. 294700 3 0.170926 4 0.826143 6 
0.30 0.450000 1 0.850000 2 0.337500 3 0.202500 4 0.101250 5 
0.31 0.480500 2 0.496517 2 0. 384800 3 0.238576 4 0.123264 5 
Y 0.32 0.512000 1 0.546133 2 0.436907 3 0.279620 . 0.199130 5 
iF 0.33 0.544500 1 0.598950 2 0.494134 3 0.326128 4 0.179370 5 
’ 0.34 0.578000 1 0.655066 2 0.556806 3 0.378628 4 0.214556 5 
0.35 0.612500 1 0.714583 2 0.625260 3 0.437682 4 0.255314 5 
0.36 0.648000 1 0.777600 2 0.699840 3 0.503885 4 0.302331 5 
0.37 0.684500 1 0.844216 2 0.780900 3 0.577866 4 0.356351 5 
0.38 0.722000 1 0.914533 2 0.868806 3 0.660233 4 0.418185 5 
0.39 0. 760500 1 0.988650 2 0.963933 3 0.751868 4 0.488714 5 
‘ 0.40 0.800000 1 0.106667 1 0.106667 2 0.853333 ~ 0.568889 5 
; 0.41 0.880509 1 0.114868 1 0.117740 2 0.965468 ry 0.659736 5 
0.82 0.882000 1 0.123480 1 0.129654 2 0.108910 3 0.762365 5 
0.43 0.924500 1 0.132512 1 0.142450 2 0.122507 3 0.877966 5 
0.44 0.968000 1 0.141973 1 0.156171 2 0.137430 3 04100782 4 
0.45 0.101250 ° 0.151875 2 0.170859 2 0.153773 3 0.115330 4 
is 0.46 0.105800 te) 0.162227 1 0.186561 2 0.171636 3 0.131587 4 
te 0.47 0.110450 fe) 0.173038 1 0.203320 2 0.191121 3 0.149711 5 
0.48 0.115200 ° 0.184320 1 0.221188 2 0.212336 3 0.169869 4 ~ 
f 0.49 0.120050 ) 0.196082 1 0.280200 2 0.235396 3 0.192280 4 
0.50 0.125000 0 0.208333 1 0.260416 2 0.260416 3 0.217018 4 
0.51 0.130050 ts) 0.221085 1 0. 281883 2 0.267521 3 0.284393 5 
0.52 0.135200 ) 0.238347 1 0. 304650 2 0.316836 3 0.278592 4 ' 
0.53 0.140450 t) 0.248128 1 0.328770 2 0.348496 3 0.307838 4 
0.54 0. 185800 0 0.262440 1 0. 54294 2 0.382637 3 0.344373 4 
0.55 0.151250 ft) 0.277291 2 0.381276 2 0.419403 3 0.384453 4 
0.56 0.156800 ° 0.292693 1 0.409770 2 0.858943 3 0.428347 5 
> 0.57 0.162450 ts) 0.308655 1 0.439833 2 0.501410 3 0.476340 4 
0.58 0.168200 fe) 0.325186 1 0.471520 2 0.546965 3 0.528731 4 
0.59 0.178050 ) 0.342298 1 0.504890 2 0.595770 3 0.585840 5 
0.60 0.180000 ° 0.360000 1 0.540000 2 0.648000 3 0.715560 4 
> 0.61 0.186050 t) 0.378301 1 0.576910 2 0.703830 3 0.715560 4 
‘ 0.62 0.192200 fr) 0.397213 1 0.615680 2 0.763443 3 0. 788891 4 
0.63 0.198450 ts) 0.416785 1 0.656373 2 0.827030 3 0.868381 4 
0.64 0. 208800 0.456906 1 0.699050 2 0.894784 3 0.954436 
2 0.5 0.211250 ° 0.457708 1 0.743776 2 0.966908 3 0.104748 3 
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‘ 
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K=2 


n 

0. 217800 
0.228450 
0.231200 
0.238050 
0.245000 
0.252050 
0.259200 
0.266450 
0.273800 
0.281250 
0. 288800 
0. 296450 
0. 304200 
0.312050 
0.320000 

328050 
0. 336200 
0. 344450 
0.352800 
0.361250 
0. 369800 
0. 378450 
0. 387200 
0.39605 
0.405000 
0.414050 
0.423200 
0.432450 
0.441800 
0.451250 
0.460800 
0.470850 
0.480200 
0.490050 
0.500000 


n 
0.479160 
0.501271 
0.528053 
0.547515 
0.571666 
0.596518 
0.622080 
0.648361 
0.675373 
0.703125 
0.731626 
0. 760888 


0.853333 
0.885735 
0.918946 
0.952978 
0.987839 
0.102354 
0.106009 
0.109750 
0.113579 
0.117495 
0.121500 
0.125595 
0.129781 
0.134059 
0.138431 
0.142896 
0.147456 
0.152112 
0.156865 
0.161716 
0.166667 


8 


DISCUSSION 
R. E. Bach, 


In recent years there have been several methods developed for 
numerical solution of differential equations using the Laplace 
and z-transforms as their basis [6-11].5 It is well known that the 
Laplace and z-transform methods are useful for obtaining analytic 
solutions of linear, constant-coefficient differential and difference 
equations, respectively. The transform approach provides a 
powerful tool for analysis and synthesis of physical systems that 
can be described by such equations. It is the purpose of this dis- 
cussion, however, to question the extension of these techniques for 
approximate numerical solution of linear, nonlinear, and time- 
varying differential equations when there exists a great wealth 
of generally proved, relatively straightforward numerical analysis 
methods. The application of transform techniques for numerical 
solutions suffers from the drawbacks of a specialized language, 
difficult to justify mathematical manipulations, and the lack of 
meaningful accuracy estimates. In comparison, the techniques 
of numerical analysis are found in a common language with re- 
liable error estimates carried along in the development of each 
method. The following is intended to partially support this 
argument. 

Of the many techniques of numerical analysis available for the 
solution of ordinary differential equations, the method of suc- 
cessive extrapolation [12] is one of the easiest to develop. The 
essential feature of this approach consists of replacing the given 
differential equation by an equivalent difference equation in the 
form of a recursion relation among a certain number of successive 


‘ Assistant Professor of Research in Communications, North- 
eastern University, Boston, Mass. 
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0.790615 
0.839629 
0.690890 
0.944463 
0.100042 
0.105882 
0.111974 
0.118326 
0.128944 
0.131836 
0.139009 1 0.211294 
0.146471 0. 225565 
0.790920 1 0.154229 1 0. 240598 
0.821731 1 0.162292 0. 256421 
0.170667 0.273066 
0.179361 0. 290565 
0.188384 1 0. 308950 
0.197743 0. 328253 
0.207646 1 0. 348510 
0.217502 i C. 369754 
0.227920 0.392022 
0.238707 1 0.415350 
0. 249873 0.439776 
0. 261425 0.465338 
0.273375 0.492075 
0.285729 i 0.520026 
0.298497 i 0.549234 
0. 311688 1 0.579740 
0.325312 1 0.611586 
0.339377 1 0.644817 
0. 353894 2 0.679477 
0. 368872 0.715611 
0. 384320 0.753267 
0.400248 1 0.792491 
0.416666 0.833333 


K=4 K=6 


n 
0.104361 
0.112510 
0.121161 
0.130336 
0.140058 
0.150352 
0.161243 
0.172756 
0.184917 
0.197754 


n 
0.114797 
0.125636 
0.137316 
0.149886 
0.163401 
0.177917 
0.193492 
0.210186 
0.228064 
0.247192 
0.267638 
0.289475 
0.312777 
0.337621 
0.364088 
0.392263 
0.422231 
0.454085 
0.487913 
0.523818 
0.561898 
0.602258 
0.685005 
0.690251 
0.736112 
0. 788706 
0.882159 
0.898597 
0.956151 
0.102096 
0.108716 
0.115690 
0.123033 
0.130761 
0.138889 


values of the dependent variable. This method will be demon- 
strated with the solution of two of the first-order equations solved 
by Boxer and Thaler [7] and again by the author of this paper 
using transform techniques. Thus, a comparison of the methods 
can be made and some conclusions drawn concerning their use 
and application. 

For simplicity in presentation, a three-point approximation 
will be used, i.e., a second-order difference equation (involving 
three ordinates) will generate the numerical solution. In order 
to eliminate a first-derivative term, consider that a solution y(t) 
can be expanded in a Taylor series such that 


Ts 
yt+T) = y(t) Ty™t) + y(t) + +... 


(62) 


Since 


T? 
yt + T) — y(t — T) = 2T y(t) + 3 +.., (63) 


a difference expression for the first derivative, involving an error 
of the order of T°, is 


1 
Tyt) = 2 + T) — y(t — T)) (64) 


In similar fashion an expression for the second derivative could be 
derived. However, since the examples to be solved are only first- 
order, Eq. (64) will suffice. 

The first equation considered is the nonlinear differential equa- 
tion 
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t 
n m 
0.66 
0.67 
0.68 4 
0.69 3 
0.70 3 
0.71 3 : 
0.72 
0.73 4 
0.74 3 
0.75 | 
0.76 3 
0.77 3 
: 0.78 3 
0.79 3 
0.80 
0.81 
0.82 3 
0.83 3 
0.04 5 
0.85 3 
0.86 3 
0.87 5 
0.88 
0.89 
0.90 5 
4 0.91 5 a 
0.92 
0.93 
0.95 2 
0.96 “2 
0.97 2 q 
0.98 2 
0.99 2 
1.00 2 x 
3 


+ yt) = 1, = 0. (65) 


Substitution of y(t) from Eq. (64) into Eq. (65) yields, after 
some rearrangement, the recursion formula 


yt + = — yXt)] + y(t — 7). (66) 


In more convenient form (with ¢ replaced by ¢ + 7), Eq. (66) 
is written as 


y(t + 2T) = 27[1 — yt + T)) + y(t). (67) 


Since Eq. (67) is a second-order recursion formula, two values 
must be available to start the solution. The first value is given by 
the initial condition y(0) = 0, while y(7) can easily be obtained 
by a few terms of a Taylor series expanded about t = 0. Choosing 
T = 0.2 sec, the first two non-zero terms yield y(7) = 0.1973. 
The solution can now be easily continued either by manual or 
automatic computation. A comparison of a few points of this 
solution with that of the author is shown in Table 9. It is seen 
that, for this case, the method of successive extrapolation is not 
quite so accurate as that of Naumov. However, the recursion 
formula is so simple that halving the sampling interval would 
provide much increased accuracy with a small amount of extra 
effort. Alternatively, of course, a higher-order recursion formula 
could be derived, using an approach similar to that outlined. 

A second example is the linear, time-varying differential equa- 
tion 

y(t) + ty(t) = t, = 0. (68) 

For this example a most accurate three-point approximation can 
be obtained. From Eq. (63) there is obtained 


6Ty(t) + T(t) = +7) — yt —T)). (69) 


Another equation involving y(t) and y(t) can be derived in the 
following manner: substitution of (¢ + 7) and (t — 7) for tin 
Eq. (68) and adding the two resulting equations, yields 


yORr+T) + y(t — T) 
differentiating both sides of Eq. (62) and adding the two resulting 
equations yields 
+ T) + — T) = + (71) 


equating the right hand sides of Eqs. (70) and (71) there is ob- 
tained a second equation in terms of y(t) and y(t) which is 


2y(t) + T2y(t) 
m 2t — (¢ + + T) — — — T). (72) 
Eqs. (69) and (72) can now be solved simultaneously for y(t) to 


yield a difference equation approximation that is accurate to the 
order of T°: 


1 
y(t) = iT (T(t + T) + + T) 


+ [T(t — T) — 3]y(t — T) — 2tT}. (73) 


Substituting y(t) from Eq. (73) into Eq. (68) and replacing ¢ by 
t + T results in the three-point recursion formula 

3 + T(t + 2T) 


(3 — tT + 27(t + T)[3 — 2y(t + T)]}. (74) 


yt+2T) = 
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Table 10 Numerical solution of y‘'(t) + y(t) = 1, y(0) = 0 


Transform Successive 
t Analytic method extrapolation 
0.0 0.000 0.000 0.000 
0.2 0.197 0.200 0.197 
0.4 0.380 0.384 0.384 
0.6 0.537 0.541 0.538 
0.8 0.664 0.667 0.669 
1.0 0.762 0.763 0.759 
0.834 0.834 0.838 
1.4 0.885 0.886 0.879 
1.6 0.922 0.919 0.929 
1.8 0.947 0.948 0.933 
2.0 0.964 0.959 0.981 


Table 11 Numerical solution of y'''(t) + ty(t) = t, y(0) = 0 


Transform Successive 
t Analytic method extrapolation 
0.0 0.000 0.000 0.000 
0.4 0.077 0.074 0.074 
0.8 0.274 0.265 0.274 
1.2 0.513 0.501 0.513 
1.6 0.722 0.713 0.722 
2.0 0.865 0.860 0.864 
2.4 0.944 0.945 0.944 
2.8 0.980 0.983 0.980 
3.2 0.994 0.995 0.995 


Choosing 7 = 0.4 and again computing y(7') by the first two non- 
zero terms of a Taylor series expansion about ¢ = 0, the solution 
is continued with the use of Eq. (74). A comparison with the 
author’s method is shown in Table 10. The results show that the 
technique of successive extrapolation provides better accuracy 
over the range. The improved accuracy over the first example is 
due to the use of the additional constraints imposed in the use of 
Eq. (68) at ¢ + T and t — T, made feasible by the linearity of the 
equation. 

In conclusion, it should be pointed out that the method pre- 
sented here does not necessarily represent the “optimum’’ nu- 
merical solution in terms of time and accuracy. The “opti- 
mum’’ method differs from one equation to the next, and its 
choice is one for the experienced analyst to make. It would seem, 
however, that the engineer in need of a numerical solution to a 
differential equation would better consult a standard text on 
numerical analysis rather than rely on an approximation of linear 
transform algebra. Of definite value in this area would be a 
course in numerical analysis for the undergraduate engineer, 
especially at those universities where digital computer pro- 
gramming is being taught. 
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Introduction 


eT hydraulic systems are used 
widely in many relatively low-power, fast-response applications. 
The power efficiency of such systems is generally poor due to the 
very nature of the control valve which functions by its throttling 
action. Improving the power efficiency means reduction in size 
and weight of the hydraulic system which often includes rela- 
tively heavy equipments such as a power source (e.g., battery 
and electric motor), a hydraulic pump, an accumulator, and 
“cooling” system. A small percentage of reduction in size and 
weight can be very significant in many airborne applications. 
A smaller hydraulic power supply system also means less initial 
cost as well as less running cost which are important considera- 
tions in industrial applications. 
Conventional control valves are so constructed that the orifice 
area of the valve varies according to the input signal (hence- 
forth, they will be referred to as area-controlled valves). The 
flow which the valve controls is not only a function of input sig- 
nal but is also a function of other variables such as load pressure 
(i.e., pressure drop across the actuator), and supply pressure. The 
purpose of this paper is to demonstrate that this relatively simple 
method of control is not necessarily the best way of control from 
either the power-efficiency or the performance viewpoint. The 
reasoning is that in the control process, both the load pressure 
and the supply pressure can change considerably. The gain of 
the valve (i.e., flow variation per unit signal variation) therefore 
is not a constant. Specifically, the gain of the valve under the 
full-load condition is less than that under the no-load condition. 
Frequently, it is not possible to satisfy a stability requirement 
when the load force is small (and valve gain is high) and at the 
same time satisfy the dynamic-response requirement when the 
load force is high (and valve gain is low). To satisfy both con- 
ditions, the designer has to use a very large actuator which means 
some sacrifice in the over-all efficiency. A constant-gain control 
valve is one in which the flow output is dependent only on the in- 
put signal to the valve but is independent of other variables (e.g., 
load variation and supply pressure variation). This property can 
be achieved by using a flow feedback arrangement so that the 


Contributed by the Instruments and Regulators Division of Tue 
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Fig. 1 Schematic diagram of a position-control system 
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Improvement of the Power Efficiency 
of a Hydraulic Control System by the 
Use of a Gain Compensated Control Valve 


flow (not orifice area) is the quantity to be controlled. The gain 
of the valve is thus constant under all conditions. The im- 
provement of the control system can be viewed in either of the 
following ways: For a given set of performance specifications, 
it is possible to use a lighter hydraulic power source. For a given 
hydraulic power supply system, it is possible to have better dy- 
namic performance under given load conditions. A description of 
the flow-feedback valve and its steady-state response was given 
by Bahniuk and Lee.! Dynamic response of systems using such 
valves were also discussed by the same authors.” 


Construction of a Valve-Controlled Hydraulic System 


Hydraulic-control systems are used to control position, velocity, 
force, or other mechanical quantities. The most widely used is 
perhaps the position-control system, the construction of which is 
shown schematically in Fig. 1. The hydraulic power supply 
usually is the constant-pressure type. A constant flow-type hy- 
draulic system is simpler and less expensive but has many draw- 
backs which make it unsuitable for applications where high per- 
formance and compact size are important factors. The simplest 
constant-pressure supply is perhaps the simple accumulator 
which is often used in the so-called one-shot system. To store 
maximum energy in a given accumulator, the volume of the oil 
should be approximately two thirds ({e — 1]/e to be exact, where 
e = 2.7183) of the total volume. Under this condition, the pres- 
sure at the end of the operation is approximately one third of 
that at the beginning of the operation. Only by reducing the 
volume of oi! stored and consequently the duration of operation, 
a better constancy of supply pressure can be obtained. For ap- 
plications which require long operating time, the accumulator re- 
quired will be too bulky to be practical. A continuous-fluid 
supply system, as shown in Fig. 2, can be used. Here the pres- 
sure is maintained constant by feeding it back to control the 
variable-displacement pump. An accumulator is often used to 
accommodate the peak flow condition. If it is sufficiently large, 
the capacity of the battery, motor, and pump, need only to be 
large enough to supply the average flow requirement. However, 


1 E. Bahniuk and 8.-Y. Lee, ‘‘The Design and Analysis of a Servo- 
valve With Flow Feedback,” Journat or Basic ENGINEERING, 
Trans. ASME, series D, vol. 82, 1960, pp. 73-80. 

28.-Y. Lee, ‘A Servovalve With Flow Feedback and the Dynamic 
Performance of a System Consisting of the Valve and an Inertia 
Load,” Proceedings, First International Congress of the International 
Federation of Automatic Control, Moscow, USSR, June, 1960. 

3E. Bahniuk, ‘Application of Servovalves With Flow Feedback,”’ 
Proceedings, National Conference on Industrial Hydraulics, Illinois 
Institute of Technology, Chicago, Ill., 1959. 


Fig. 2 Schematic diagram of a continuous fivid power-supply system 
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sometimes the duty cycle is such that peak flow duration is too 
long and the accumulator required is too big to be practical. 
Then, the designer wil] have to make a compromise between the 
over-all weight of the system and the constancy of the supply 
pressure. Since the performance of an area-controlled valve de- 
pends on the supply pressure, reduction of pressure due to in- 
sufficient accumulator capacity can cause some deterioration in 
performance of the system. 

The construction of the control valve used can be of many de- 
signs. Valves that have a closed-center output stage are more 
efficient since their stand-by power loss is smaller. The pressure 
gain of this type of valve is also higher than an open-center one. 
Only this kind of valve will be discussed in this paper. Compari- 
son will be made between the so-called orifice-area-controlled 
valve and flow-feedback valve. The schematic construction of 
both types of valves is shown in Fig. 3. 
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Fig. 3(b) Schematic diagram of the flow-feedback valve 


The steady-state, pressure-flow characteristic curves of both 
types of valves at two different supply pressures are shown in 
Figs. 4 and 5. From these figures, it is seen that, for an area- 
controlled valve, the output flow varies not only with the input 
signal but also varies with the supply pressure and the load pres- 
sure. For a flow-feedback valve, the flow is almost independent 
of supply pressure and load pressure within the useful range of 
the valve. These are the characteristics of the valve which ac- 
count for the improved power efficiency of the control system. 


Performance Requirements of a Hydraulic Position-Control 
System and the Design Procedure 


Consider a position-control system, as shown in Fig. 1. The 
load forces acting on the actuator can be classified as follows: 


1 Coulomb friction. The magnitude of this type of force is 
independent of the position and the velocity of the actuator but 
it always acts in a direction opposite to the direction of the ve- 
locity. The level of the force generally varies considerably with 
the wear condition, as well as with the environmental condition; 
e.g., temperature. A large part of this type of frictional force is 
often due to the actuator itself. 

2 Viscous-type frictional force. The magnitude of this type 
of frictional force is a function of the velocity. This function 
can change greatly with temperature and other environmental 
conditions during the operation of the servo. 

3 Spring-type force. The magnitude of this type of force is 
a function of the position of the actuator. A typical example is 
the aerodynamic force acting on the control surfaces of an air- 
plane or a missile. 
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4 Random external forces. These are forces required to per 
form a specific function of the control system. Examples are 
gravitational force of a weight-lifting servo, the cutting force of 
a machine-tool control servo, wind load on an antenna drive, 
etc. These forces are not necessarily a function of speed or posi- 
tion of the actuator. 

5 Inertia forces. These are forces due to the mass of the load 
which seldom changes with environmental conditions. When the 
inertia force is reasonably large, as is often the case, it becomes 
an important limiting factor of the dynamic response of the 
system. In this paper, this is assumed to be the situation. In 
other words, it is assumed that the inertia force is always present 
and has a dominating effect on the system performance. 

The parameters used to specify the performance of a system are 
usually as follows: 


1 Maximum travel of the actuator. 

2 Load sensitivity of the actuator. 

3 Stability margin of the system due to small perturbation 
under the worst combination of loading and initial conditions. 

4 Dynamic response due to a small input signal under the 
worst combination of loading and initial conditions. (These 
conditions are not necessarily the same as in item 3.) 

5 Full-input velocity of the actuator under the worst combina- 
tion of loading and initial conditions. This requirement is also 
related to the maximum level of input signal before the valve 
saturates. 
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Fig. 4 Pressure-flow-input characteristic curves for an area-controlied 
valve at two supply-pressure levels 
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Fig. 5 Pressure-flow-input characteristic curves for a flow-feedback 
valve at two supply-pressure levels 
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Due to the rather complicated relationships between the 
various requirement parameters, trial-and-error procedures have 
been found to be the quickest in designing the system. The pro- 
cedure is outlined as follows: 


1 Make a trial assumption of the maximum working pres- 
sure across the load. This pressure should be a fraction of the 
supply pressure. 

2 Determine the maximum possible load force the actuator 
sees at any time. This force should include a possible combina- 
tion of all forces (except inertia force) that can act simultane- 
ously. The inertia force is automatically considered when the 
dynamic condition is investigated. 

3 Determine the actuator area which is equal to the quotient 
of the maximum load force and the design load pressure. 

4 Determine the size of the valve orifice at full input con- 
dition. The size of orifices should be large enough that under 
maximum load condition 3 the maximum velocity requirement is 
satisfied. 

5 Knowing the area of the ram, the maximum travel of the ram, 
the inertia load, and damping forces, it is possible to determine 
the open-loop response of the valve-actuator-load system.‘ 

6 Determine the minimum loop gain Kn of the closed-loop 
system required to meet the dynamic-response requirement. 

7 Determine the maximum loop gain Ky of the system to 
satisfy the stability requirement. 

8 In order to satisfy both 6 and 7, the actual gain must be 
greater than K,, but less than Ki. 

The ratio Kn/Kn is the maximum allowable gain ratio of the 
system. In other words, valve gain can vary by the ratio of 
Kn/Kn and still satisfy the dynamic and stability requirements. 

9 By the analysis described in the next section, it is possible 
to determine the maximum load pressure of the system once the 
gain ratio and the characteristics of the load are known. 

10 If the load pressure obtained in 9 differs from the assump- 
tion made in 1, revise the load-pressure assumption and repeat 
the process until the correct load pressure is found. 


Determination of the Working Pressure Across the Load 
for an Area-Controlled Valve 


The maximum working pressure across the actuator (or the 
load) of a servo system can be determined when the maximum 
allowable gain ratio is established from the dynamic-response 
considerations. Other factors influencing the load pressure are 


4J. L. Shearer, ‘‘Dynamic Characteristics of Valve-Controlled 
Hydraulic Servomotors,"” Trans. ASME, vol. 76, 1954, pp. 895-904. 
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Fig. 6 Load line and static operating points for viscous-type friction 
load 
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the load characteristics, the pressure-flow-input characteristics 
of the valve, and the maximum supply pressure variation. The 
determination of the working pressure for several kinds of load 
will be discussed as follows: 

1 Viscous-Type Friction. The pressure-flow characteristics of 
the load is a straight line passing through the origin. When it is 
superimposed on the pressure-flow-input characteristic curves of 
the valve, the intersection points determine the operating condi- 
tions corresponding to various input-signal levels, Fig. 6. How- 
ever, the pressure-flow-input curves of the valve are, strictly 
speaking, for steady-state operating conditions only. Thus the 
afore-mentioned method of determining the operating conditions 
is a good approximation only when the input-signal variation is 
considerably slower than the response of the valve. The maxi- 
mum-gain condition occurs at the origin where the variation of 
flow per unit variation of signal is greatest. The minimum-gain 
condition occurs at either the first or third quadrants (points A 
and B in Fig. 6), where the variation of flow per unit variation of 
signal is lowest. 

The pressure-flow-input characteristic curves for a four-way, 
closed-center area-controlled valve is as shown in Fig. 4 and can 
be represented by the following equation: 


Q, = KX (?. a) 


flow through the load 
valve constant 

input signal 

P, = pressure across the load 
Ps; = supply pressure 


Differentiating (1) 


Xe ss 
dQ, = K Ps ~ Pus aX) = PL 
|Xo| 


(2) 
where 
Xo, Pro, and Qi represent a set of consistent values of valve 
displacement, load pressure, and load flow, respectively. Since 
the operating points are always on the load line 


dP, Pio 1 
= 3 
dQ, Qu @) 
where S is the slope of the load line. Also 
Qu = — Pw (4) 
Substituting equations (3), (4) in equation (2) 
Xe 
2K —- Pra 
aX } = ( Xo ) 
mw) + X, Lo 


Since the operating points are always in the first or third 
quadrants .. Pz>and X always have the same sign. The foregoing 
equation reduces to 


_ 2K (Ps — 
Pr=Q/8 


The maximum gain occurs when the input signal X = 0. Under 
this condition P,; also = 0. 
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where 


~ 


di 


The minimum gain occurs when the load pressure is maximum or 
when = then 


= 
aX mio 
where P;y = maximum load pressure. If the supply pressure 
also varies, the minimum gain would occur when the supply pres- 
sure is at the minimum P,’ and at the same time the input signal 


is a maximum Pz») = P,y, where Ps’ denotes minimum possible 
supply pressure. 


2K(Ps — 


OP, — 


_ 2K(Ps' — 
2P,' — |Pim| 


(9) 


The maximum gain ratio y is the ratio of equations (7) and (9) 


(10) 
2K(P,' 2(p a)” 
2P3' — 


where a = \Prol/Ps and p = P;'/Ps. <A plot of equation (9) is 
shown in Fig. 7. 

2 Covlomb-Type Friction. The load line for this type of friction 
is the zig-zag line as shown in Fig. 8. If the friction characteristic 
is absolutely constant, the gain would be always equal to the 
spacing of the valve characteristic curve along line A-B or line 
C-D. However, in designing such a system, one cannot count 
om the fact that the friction characteristic remains constant. As 
we know, it often changes considerably with environmental con- 
dition, wear, and lubrication condition. One conservative de- 
sign assumption is that the friction can vary from zero up to a 
certain maximum value. It is then possible to derive an equation 
relating the gain ratio, the maximum load pressure P,y,, and 
the variation of supply pressure p. The equation for the charac- 
teristic curves of the valve is 


xX 
= KX (Ps (1) 
The maximum valve gain occurs when P,;, = 0 
(2) = K(Ps — = K VPs (11) 


The gain of the valve is minimum when P,; = P,y and, when 
supply pressure = pP,, 


(2) = K(pPs (12) 


the gain ratio 


KVP; 1 an (13) 
K(pPs —  (p—a)'’* 


The relationship of equation (13) is shown in Fig. 9. 

3 Random Lead. One common form of random load is a con- 
stant force acting on the actuator. The load line is then a vertical 
line such as A-B, as shown in Fig. 10. The valve gain is a maxi- 
mum in the fourth quadrant where the spacing is the largest, and 
minimum valve gain occurs in the first quadrant where the spacing 
of the curves is closest. The relationship between gain ratio and 
load pressure can be derived as follows: 

From equation (2) 
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dQ, Xo 


The maximum valve gain occurs when Xp is negative and P;y 
is positive or vice versa. 


(15) 


The minimum valve gain occurs when both Xo and P, are 
positive or when both are negative, and also when the supply 
pressure is at the minimum 


d 1 
(222) 
gain ratio = y = K(Ps + | + [Pum|) ’ ate (17) 


K(pPs — (p — 


The plot of equation (16) is shown in Fig. 11. 


Spring-Type Load 

The load line for a spring-type load is harder to define than the 
three previous cases, since the shape of the load line depends on 
the mode of motion of the load. If, for instance, the motion of 
the load is a sinusoid function of time of frequency @,, the load 
line is an ellipse, as shown by curve 1 in Fig. 12. If the frequency 
increases to w, while the amplitude remains constant, the load 
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Fig. 7 Plots showing gain ratio versus supply-pressure and load-pres- 
sure levels—viscous load 
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line is curve 2. For a given maximum load pressure, P;y,, the 
maximum valve gain occurs at regions immediately above point 
A or below point B in Fig. 12, and a minimum valve gain occurs 
at regions immediately below point A,and above point B. The 
maximum gain variation is indicated by equation (17). 


Determination of the Working Pressure Across the Load 
for the Constant-Gain-Type Valve 


The working pressure for the constant-gain-type valve can be 
determined relatively easily. As long as the load pressure P, is 
within the linear range of the valve, the gain ratio is unity. When 
P,, goes beyond the linear range of the valve, the behavior be- 
comes similar to that of an area-controlled valve. If the valve 
selected has a fairly large linear range, e.g., 85 or 90 per cent, 
one can neglect completely the nonlinear region. The relationship 
between load pressure ratio a, the gain ratio y, and the supply- 
pressure variation p is independent of the load characteristics. 
This relationship is shown in Fig. 13. 


Power Efficiency of a Servo System and the Maximum 
Working Pressure Across the Load 


The efficiency of a hydraulic control system at any instance can 
be defined as the ratio of the mechanical power output and hy- 
draulic power input. 


(18) 


| 


= 
< 
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Fig. 9 Plots showing gain ratio versus supply-pressure and load-pres- 
sure levels; Coulomb-type friction 


Fig. 10 Load line for random-type load 


Journal of Basic Engineering 


where 7; is the instantaneous efficiency. 

Equation (18) applies only to systems using an ideal closed- 
center four-way valve, and a 100 per cent efficient actuator. 
Under this condition, the total system flow is equal to the load 
flow. For underlapped valves, the efficiency is lower than that 
indicated by equation (18). 

Equation (18) indicates that for two-position control systems 
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Fig. 11 Plots showing gain ratio versus supply-pressure and load- 
pressure levels; random load and spring-type load 
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Fig. 12 Load line for spring-type load 


Fig. 13 Plots showing gain ratio versus supply-pressure and load- 
pressure levels; constant-gain valve 
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performing the same control operation, the one with the higher 
load pressure has higher instantaneous efficiency. 

The over-all average efficiency of the system over a period of 
time is equal to the ratio of total work accomplished during 
the time period and the total hydraulic work input during the 
same time period: 


ts 
\V-Plae 
fi 


tz 
\V- Plat 
ti 

ts ts 


Ter, = 


= average efficiency during the time period 7, and 72 
= instantaneous velocity of the actuator 

= instantaneous force acting on the actuator 

= area of the actuator 


From equation (19), it is seen that the power efficiency of a 
system cannot be determined unless the duty cycle and input 
function are known exactly. However, for the purpose of this 
paper, the main interest is to compare the power efficiency of two 
position control systems performing the same duty with identical 
input but having different types of valves. Since these are posi- 
tion-control systems, the error between the output and the input 
of the system is not likely to be very much for ordinary duties. 
Thus if the input function to both systems is identical, the out- 
put of both systems can be assumed to be identical. So will be 
the numerator of equation (19). (An exception to this condition 
is when the input function changes very rapidly; e.g., a step 
function. Under this condition, the actuator of one system can 
travel considerably more than that of the other if the former 
happens to be more oscillatory, and the value of 
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term of equation (19) will be much greater for the former system.) 
The ratio of the average efficiency for the two systems is thus 
inversely proportional to the denominators of equation (19) 
or inversely proportional to the area of the actuator. In the pre- 
vious discussion, it has been shown that, when using a flow-feed- 
back valve, it is possible to use a higher load pressure (by using a 
small actuator area) than when an area controlled valve is used. 
Thus it is possible to achieve higher efficiency with the flow-feed- 
back valve. 


Conclusion 


When a closed-center four-way valve is used for position con- 
trol, the flow gain of the valve can vary with load variation as well 
as supply pressure variation. The allowable gain variation of a 
system depends on the stability requirement and the dynamic- 
response requirement. If the dynamic requirement is very 
stringent, the allowable gain variation will be small and vice 
versa. It also can be proved that the allowable flow-gain varia- 
tion depends on the level of the working pressure which is closely 
related to the power efficiency of the system. Everything else 
being equal, the higher the load pressure, the more efficient will 
be the system. The main advantage of a flow-feedback valve is 
that the flow gain is constant over a wide load-pressure band and 
a wide supply pressure band. It is possible to have the best 
power efficiency without sacrificing the dynamic performance. 
The net gain in power efficiency of a constant-gain valve over an 
area-control valve depends on the load characteristics and on the 
dynamic requirements. 
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Preliminary Design of a Roll-Stabilization 


DAVID V. STALLARD 
Senior Research Engineer, 
Feedback Controls Inc., 
Natick, Mass. 


System for a Ship 


The roll stabilization of a ship with servo-operated fins is analyzed as a system problem 
from the viewpoint of a control engineer. 


The preliminary design of a roll-damping 


system for a particular passenger ship is set forth. The roll-angle response to a sinu- 
soidal sea-torque is analyzed, and the improvement made by the roll-damping system 
ts evaluated. A rate gyro of good resolution and a damped pendulum are adequate for 
roll-sensing instrumentation. A preliminary design for a valve-controlled fin servo is 
made, but it is so wasteful of power that a pump-controlled fin servo is designed as an 
alternative. Energy aspects of control systems in general are considered briefly. 


Introduction 


HE HisToRY of attempts to stabilize ships against 
rolling is quite varied and interesting, and it goes back at least as 
far as 1883. Water chambers, pumped tanks, moving weights, 
and even immense gyroscopes have been used. Almost every 
modern system of roll damping uses servo-controlled fins in pairs, 
whose motion is ordered by instrumentation such as gyroscopes 
that sense the vessel’s rolling. Acting like small aircraft wings, 
the fins develop hydrodynamic lift and cause a roll moment that 
counteracts the ship’s roll. The British have made most of the 
progress in the roll-damping field and have outfitted the great 
majority of ships, including the Queen Mary. Wallace [1]! has 
candidly described the development of the British Denny-Brown 
System and Bell [2, 3, 4, 5] has developed the Muirhead gyro con- 
trols for that type of system. Chadwick has evaluated various 
systems and analyzed the early Sperry Gyrofin [6] Ship Stabilizer 
in a very comprehensive paper [7] that lists 53 references to the 
art. 
The afore-mentioned systems are designed for large vessels 
and employ variable-displacement pumps and ram-operated, 
flapped fins that may be retracted into the hull of the ship. In 
contrast, Vosper Ltd., of Portsmouth, England, has developed 
a roll-damping system [8] for medium-sized or smaller ships, such 
as the yacht Christina [9]. Size and cost are important con- 
siderations, and the installations to date have used ram-operated, 
nonretractable, unflapped fins with servo valves. Fig. 1, section 
of ship through fins, shows a typical placement of fins [8]. 

This paper treats, strictly from a control engineer’s point of 
view, the analytical problem of roll stabilization. In particular, 
a preliminary design of a roll-damping system for a passenger 
vessel with low inherent damping is worked out, with the objec- 
tive of using modern electro-hydraulic techniques to achieve a 
reliable, effective system of low cost. Vosper has determined 
certain important parameters, including the following: Each of 
the two fins shall have an area of about 30 ft? and a mass-moment- 
of-inertia of 177 lb-ft-sec?; the steady-state hydrodynamic load 
torque on the fin shall vary with angle as the curve in Fig. 2, with 
& maximum that is estimated to be 130 ton-in; the servo must be 
capable of simple harmonic motion from hard up to hard down 
(60 degrees total) in 2 seconds; at 30 deg fin angle, the lift forces 
on the fin cause a rolling moment of 1330 ton-ft. (The long ton of 


1 Numbers in brackets designate References at end of paper. 
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2240 lb is used.) The specified fin motion calls for a frequency 
of 0.25 cycle/sec, a maximum angular velocity (at neutral) of 
0.824 radian/sec, and a maximum angular acceleration (at maxi- 
mum angle) of 1.29 radian/sec?. 
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Fig. 1 Section of ship through fins 
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Fig. 2 Steady-state hydrodynamic torque on fin versus fin angle 
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Furthermore, a rate gyro is to be used to detect the roll velocity, 
and its signal will command full fin angle for a roll velocity of 0.86 
degree/second, corresponding to a 3-degree out-to-out roll at 
the ship’s natural period of 11 sec. Further characteristics of 


the ship are set forth in the paragraph “Dynamic Response of 
Ship.” 


Energy Aspects of Roll-Damping System and Other Control 
Systems 


From the viewpoint of a control engineer, the roll-damping 
system is a large regulator system that comprises: 


1 The ship itself, i.e., its response to a roll moment; 

2 The roll-sensing instruments that produce a feedback sig- 
nal; 

3 The fin servomechanism that positions the fin; 

4 The fin dynamics, i.e., the relation of fin angle to the roll 
moment that it produces on the ship. 


See the block diagram in Fig. 3. As a regulator, the roll-damp- 
ing system is required to keep the roll angle nearly zero, despite 
the disturbing sea moment. Note that the weak electrical signals 
from the roll-sensing instruments control the powerful fin servo- 
mechanism, whose fin utilizes the ship’s propulsion power to 
produce a corrective roll moment. Thus considerable power 
amplification and modulation of energy sources are involved. 


INSTRUMENTATION 
Fig. 3 Block diagram of roll-damping system 


Broadly speaking, a servomechanism or any other system using 
“true’’ (as distinguished from “conceptual-only”’) feedback must 
produce its controlled output by utilizing energy from a power 
source other than the power source associated with its input 
signal. Thus a feedback loop must have at least one “generalized 
power amplifier,’’ which is defined as a device with (1) an input 
signal; (2) an output signal which is a single-valued function 
(with the exception of moderate hysteresis) of input signal and 
independent load forces; (3) a power source or sources which is 
modulated by the input signal to produce the output signal; (4) 
dependent load forces (like inertia), which depend on the output 
signal; (5) independent load forces which are output disturbances 
that do not depend on the output signal. This definition em- 
braces devices with electrical, magnetic, mechanical, hydraulic, 
pneumatic, or thermal components, or combinations thereof, 
with or without feedback loops. Generalized power amplifiers 
(GPA) may be classified according to their methods of modulat- 
ing power as (1) dissipative GPA, in which source power is 
liberally dissipated for effective control of output signal; or (2) 
switching GPA, in which source power is rapidly switched on and 
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off; or (3) efficient GPA, in which source power flows continuously 
to the output, without major losses. 

The author believes that there are important and fundamental 
things to be learned about the energy processes in generalized 
power amplifiers, e.g., relation of power gain and response in a 
given GPA. Tustin [10] has explored the latter relation for d-c 
generators. Present knowledge of GPA’s, although limited, 
enables rational decisions to be made in basic servo design, e.g., 
a limitation on power gain for each GPA. Perhaps future knowl- 
edge about the nature of generalized power amplifiers will lead 
to new and useful configurations. A paper on the foregoing ideas 
is planned for the future. The present literature of automatic con- 
trol is replete with signal theory, i.e., block diagrams for signals 
only, network compensation, statistical theory, ete. Too little 
thought is given to actual GPA hardware and energy processes. 


Analysis and Design of Roll-Damping System 


The following sections develop the transfer functions in the 
block diagram, Fig. 3. The over-all frequency response of the roll- 
damping system is derived, and the roll angle for the particular 
example is evaluated by comparison with 9o;r, the roll angle with 
fins locked at neutral. The means of improvement include (1) in- 
creasing the loop ratio Y of the roll-damping system; (2) adding 
“feed-forward compensation,’’ if it is possible. Preliminary de- 
signs for two types of fin servos are given. 

Dynamic Response of Ship. We begin the analysis of the roll- 
damping system by considering the dynamic response of the ship 
itself, in Fig. 1. (See also reference [7].) It may be shown that, if 
a net roll moment 7’, (due to the sea slope and/or the fins) is 
applied to the ship, then the equation of roll is: 


(1) 


where @ is the roll angle measured from the vertical, J, is the 
moment of inertia of the ship about its c.g., B, is the rather slight 
natural damping factor of the hull, and K, is the effective spring 
constant. The term K,@ is the natural righting moment of the 
ship and is equal to: 


K,0 = WGM@ (2) 


where W is the displacement (weight) of the ship and GM is the 
metacentric height, i.e., height of the metacenter above c.g. [11]. 

The net roll moment may be expressed as the difference be- 
tween the disturbing sea moment 7’, and the counteracting fin 
moment 7’;: 


T,=7T,-T, (3) 
which may be approximated as: 
al 
T, = WGMT + B, -_ T, (4) 


where TI is the sea slope, i.e., angle of sea surface with horizontal, 
and cross-coupling effects [7] are neglected. 

From equation (1), it is apparent that the transfer function of 
the ship is: 


G = = (5 
ins K, + By + J,s? 5) 
which may be expressed in the quadratic form: 
1 


8 s \3 (6) 
K,[1 + + (2)'] 
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For the ship under consideration, the coefficients are approxi- 
mately: 


J, = 56,000 ton-ft-sec? + 20% (7a) 
B, = 2000 ton-ft-sec (7b) 
K, = 18,000 ton-ft/rad (7c) 


in which the long ton of 2240 lb is used. From equations (5), (6), 
and (7), the damping ratio and natural frequency are calculated 
as: 


= 0.0314 (8a) 
w) = 0.57 rad/sec (8b) 


corresponding to a natural period of 11 sec. A recent observation 
at sea during a moderate swell showed that the ship was indeed 
poorly damped, but that its rolling period was 13 seconds, pre- 
sumably indicating an error in the inertia figure. 

Unfortunately, not much is known about the statistical nature 
of the disturbing sea moment 7’, under various sea conditions 
[7]. Nevertheless, there are some useful oscillograms [2, 4, 5, 
8, 9] of the roll angle of various ships, operating with their damp- 
ing fins stowed at neutral. An examination of these oscillograms 
indicates that the waveform, although somewhat random, has a 
predominant frequency content near the ship’s natural fre- 
quency. This is to be expected, since the quadratic transfer 
function is so poorly damped. 

Thus it is concluded that the roll-damping system should be 
designed to be particularly effective at frequencies near w. Good 
suppression of sinusoidal disturbances from zero frequency to 
2w» or perhaps 3w» would be desirable. 

Sensing Instrumentation. In the general case, the roll-sensing in- 
strumentation produces a signal R, that is given by: 

dé 


R, = k,6 ke dt + ks dt? (9) 


where k,, ke, and k; are, respectively, the sensitivity to roll angle, 
roll rate, and roll acceleration. Lags or resonances of the instru- 
ments are negligible at frequencies of interest. It follows that 
the transfer function H is: 

_R, 


= 6 = ky + k.S + kS? (10) 


In practice, the frequency response of this transfer function has 
a lower “break frequency’’ approximately equal to k,/k2 and an 
upper break frequency of k2/k;. It will be shown that accelera- 
tion feedback is probably not necessary, and so k; will be zero. 
(See also Ref. [7].) 

A damped pendulum, supported by the shaft of an angular 
transducer such as a synchro, would be a suitable instrument for 
measuring roll angle. If the effective length of the pendulum 
were 3.0 in., then its natural frequency would be 11.3 radian/sec, 
and so its response to rolling would be quite fast. A steady hori- 
zontal acceleration A, of the pivot point would produce,an angu- 
lar error equal to (A,/g) radian, where g is the acceleration of 
gravity. However, such acceleration errors on a ship are not 
likely to be large, particularly if the pendulum is mounted near 
the ship’s center of gravity. Also, it may be desirable to stabilize 
a passenger ship to the apparent vertical [7]. 

The most practical and economical device for measuring roll 
rate appears to be a single-degree-of-freedom gyro, with spring 
restraint and viscous damping to prevent oscillation. A suitable 
gyro appears to be the Minneapolis-Honeywell GG79A5 Rate 
Gyro [12], which is rated for 6 deg/sec maximum, a threshold of 
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0.1 deg/sec or less, a resonant frequency of about 4 cycles/sec 
and a maximum damping ratio of 1. This device has a wirewound 
potentiometer, so that a d-c or a-c voltage proportional to angular 
rate may be obtained. 

Fin Servomechanism. ‘The design of the fin servomechanism will 
be explored more thoroughly later. The servo produces a fin 
angle ¢ proportional to the fin command signal R,. It utilizes a 
hydraulic motor that produces a rate of change of fin angle which 
is proportional to the servo error signal; as a result the open-loop 
transfer function of the fin servo has a single integration. The 
crossover frequency at which this transfer function falls to unity 
is designated as wy, which is well above the ship’s natural fre- 
quency @. Thus the closed-loop transfer function of the fin servo 
at all frequencies of interest is approximately: 


k 
(11) 
8 
1+— 
We 


where kg in radians/volt is the ratio of fin angle to fin command 
at zero frequency. 

Fin Moment on Ship. In Fig. 3, the transfer function G, repre- 
sents the fin dynamics, such that @ times G, equals 7',, the trans- 
form of the fin’s roll moment. It seems possible that there will be 
a lag of fin torque 7’, for a rapidly varying fin angle, partly be- 
cause a new flow pattern cannot be established instantaneously. 
Therefore it is assumed that the transfer function G, can be ap- 
proximated as: 


G, = — (12) 


where the factor k, varies with the square of ship’s velocity and 
T, is a small unknown time constant. This design is concerned 
with a small range of ship’s velocity and so k, will be assumed as a 
constant value given by: 


30 deg (13) 


= 2540 ton-ft/rad 


Frequency Response of Roll-Damping System. The most con- 
venient way to analyze the response and effectiveness of the roll- 
damping system without feed-forward is as follows: (1) Assume 
that the system does not saturate at any time and that it is there- 
fore linear, i.e., conforms to the principle of superposition. (2) 
Find the roll angle @ caused by a sine-wave torque disturbance 
T,. (3) Compare this roll angle @ to the roll angle 0,1; with the 
fins stowed. 

From equations (3), (5), (10), (11), and (12), it is easily shown 
that the transfer function from the disturbing sea moment 7’, to 
the roll angle @ is: 


G, 
T, 1+6,HG6, 
(14) 
G, 


where Y is the loop ratio G,HGgG,._ If we may neglect the lag in 
the fin servo and hypothetical time constant 7, in the fin dy- 
namics, then equation (14) may be expressed as: 


T, (K. + hikek,) +(B, + + (J, 4 


It is seen that angle feedback adds an apparent spring stiffness, 
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rate feedback adds an apparent damping, and acceleration feed- 
back adds an apparent inertia [7]. 

Let us now compare 6 with @.:;, both of which quantities can be 
and have been recorded at sea. It is apparent from equations 
(3), (5), and (14) that the roll with fins stowed (Y = 0) is simply: 


= (16) 
and so the desired ratio is: 
6 1 
Bute ry 1+ Y (17) 


which is a typical equation for a regulator system. Thus, the 
larger the value of Y at a given frequency, the better is the im- 
provement in roll reduction. Now, assume that k; and tT, are 
zero and that the break frequency k,/k: is not larger than the 
ship’s natural frequency wo. Then, at frequencies between w» 
and wg, the loop ratio Y may be approximated by: 


1 
Yo ja (kes (kok) (18) 


and the frequency w, at which the Y-asymptote falls to unity 
magnitude is: 


a, 


(19) 


From equations (10) and (11) and the Introduction, the product 
kek is equal to the ratio of steady-state fin angle to roll rate 
(with k, equal to zero): 


(20) 


So from equations (7), (13), (19), and (20), the asymptotic gain- 
crossover frequency is: 


w, = 1.58 rad/sec (21) 
which corresponds to 0.252 cycle/second. By an interesting coin- 
cidence, this gain-crossover frequency of the roll-damping loop is 
the same as the maximum frequency for which the fin servo must 
be able to drive the fin sinusoidally through the maximum angle. 
This coincidence indicates a fortunate match between frequency 
response of the roll-damping system and power capability of the 
fin servomechanism. 

A plot of the magnitude of Y versus frequency is shown in Fig. 
4, in which the instrumentation break frequency k,/k2 has been 
set equal to aw. This setting helps to make Y reasonably large 
at low frequencies, yet makes the asymptotic slope equal to —6 
db/octave between w. and wy, which contributes to system sta- 
bility. The servo crossover frequency ws has been chosen as 8.9 
radian/sec so that the servo lag will be only 10 deg at the system 
crossover frequency of 0.57 radian/sec. 

The acceleration gain k, has been set equal to zero; if instead 
the frequency k2/k; were set between w» and wg, then a portion 
of the asymptotic frequency response would have zero slope. On 
the other hand, k./k; could be set equal to ws to counteract the 
servo lag, but this does not seem to be necessary. It is concluded 
that, although acceleration feedback might be used to improve 
performance at high frequency, it probably is not necessary if 
the rest of the roll-damping system is well designed [7]. 

To carry through the comparison of @ with 6.11, the magnitude 
of equation (17) versus frequency is plotted in Fig. 5. It is ap- 
parent that the improvement is moderately good at low frequen- 
cies and is quite good near w» where the magnitude of 9.11 would 
be large. It would probably be instructive to plot 0/7’, also, 
which would be a rather overdamped quadratic. 
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Means for Improvement in Roll Damping. How might the roll 
reduction be improved? The first way would be to increase the 
magnitude of Y. It is conceivable that the loop ratio Y might be 
doubled by having a roll rate of 0.43 deg/sec (instead of 0.86 
deg/sec) call for full fin angle. Hence the product k2k, would be 
69.8 sec instead of 34.9 sec, and w, would become 3.16 radian/sec. 
Thus 6/6.:: would be improved at all frequencies of interest. 
However, the most severe limitation on the loop ratio is thought 
to be resolution of the instrumentation. A good spring-restrained 
rate gyro such as the Minneapolis-Honeywell GG79 would have 
a threshold of 0.1 deg/see or possibly better. This is adequate 
for the system in which 0.86 deg/sec calls for full fin angle, but 
questionable for the 0.43-deg/sec system. There are other gyros 
with better resolution, but they are much more expensive. 

The second way in which roll reduction might be improved is 
to use “feed-forward’’ compensation, which Chadwick [7] terms 
“feed-ahead.’’ Feed-forward compensation supplements feed- 
back control by measuring an outside disturbance and applying 
a corrective signal before the disturbance has time to upset the 
system. Ideally, the disturbance should be exactly canceled by 
the corrective signal. Suppose that a suitable feed-forward in- 
strumentation with transfer function G, is included as shown in 
Fig. 3, so as to measure the sea torque 7';. It is easily shown that 
the rolling response is then: 


1+ Y 


(22) 
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where the second term represents the effect of the feed-forward 
instrumentation. It is apparent that if 


1 — = 0 (23) 


then the roll response 6 will be zero. Clearly, no additional stabil- 
ity problem is involved, because the loop ratio Y is unchanged. 
Furthermore, the transfer function G, does not have to be cali- 
brated exactly, because any value of 1 — G,G4G, that is less than 
unity will produce a reduction in 6. If G, is equal to 1/kgk,, then 
1 — GGG, is nearly zero. 

It would appear that the technical feasibility of applying feed- 
forward compensation hinges on the instrumentation for measur- 
ing the disturbance torque 7. The author believes that there is 
a simple way to measure the quantity ' — @, i.e., the slope of the 
sea surface relative to the ship’s deck. If so, then T and W GMT 
may be found by straightforward circuitry. The latter term is 
the predominant part of the sea torque 7',; alternatively, a two- 
term approximation to 7, may be used (see equation 4). At the 
time of this writing, the hypothetical measurement scheme had 
not been evaluated by a naval architect and so it will not be 
presented here. 


Design of Valve-Controlled Fin Servomechanism 


See the Introduction for the major specifications. Simple- 
harmonic motion of the fin servo at a frequency of 0.25 cycle/sec 
(i.e., 1.57 radian/sec) to 30 deg peak calls for a peak angular 
velocity (at neutral) of 0.821 radian/sec and a peak angu- 
lar acceleration (at maximum fin angle) of 1.29 radian/sec’. 
The torque required to give the fin this acceleration is 2740 lb-in., 
which is negligible compared to the hydrodynamic load torque of 
291,000 lb-in. at maximum fin angle. The gain-crossover fre- 
quency ws of the servo has already been chosen as 8.9 radian/sec 
or more. 

Two general types of servomotor appear possible for utiliza- 
tion: (1) A ram or pair of rams with linkages to a crosshead, and 
(2) a vane-motor of limited rotation, directly coupled to the fin 
shaft. In the past Vosper [8, 9] has used a pair of opposing cylin- 
ders connected to a crosshead, with flexible hoses to allow for the 
motion of the cylinders. The major disadvantage of this con- 
figuration is that the hoses contribute a large “line compliance,”’ 
which increases the total compliance of the motor and may lower 
its resonant frequency drastically [13]. Furthermore, the joints 
and bearings tend to contribute “mechanical compliance’’ [13]. 

A vane-motor of limited rotation is a more compact device and 
it does not require the detrimental hoses. A suitable vane-motor 
would be the Houdaille Industries model 308349-D (with some 
modifications), having a radian displacement D, of 165.6 in.* 
This machine would develop the estimated maximum hydrody- 
namic fin torque of 130 ton-in. or 291,000 lb-in. at a theoretical 
pressure drop of 291,000/165.6 or 1760 psi. Partly because the 
maximum hydrodynamic load torque is not known with precision, 
a system pressure of 3000 psi has been chosen. 

The manufacturer has estimated that the total compliance of 
this machine is about 10~ radian/lb-in. It appears that the 
compliances of the connecting shaft, fluid lines, and base can be 
made lower than this value. From reference [13], the fin inertia 
and motor compliance can be used to calculate the mechanical 
resonant frequency w,, of the motor and load as 69 radian/sec. 
The transfer function from valve stroke X to fin angle ¢ is: 


1 
1+ + 


where k, is the incremental ratio of valve flow-rate to valve stroke 
at the particular conditions of flow and pressure. 
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x Ds 


(24) 


A possible choice of servo valve would be the Raymond 
Atchley Model 450, which has a flow rating of 50 GPM at 1000-psi 
total pressure drop across its orifices. An alternate choice would 
be the Sanders Model SV-324, rated for 45 GPM at 1000-psi pres- 
sure drop. Calculations show that either valve would be able to 
control the motor through the specified maximum sinusoid at 
0.25 eps without being stroked more than half the maximum, 
and so generous margins are available. 

The electrical reference-input signal and the feedback signal 
from an angular transducer mounted on the back of the motor 
would enter a simple proportional amplifier that drives the valve 
coil. The gains of the feedback transducer and the servo ampli- 
fier may be calculated in a straightforward manner. 

An oil supply of nominally constant pressure is required, either 
from a variable-delivery pump or a constant-displacement pump 
with unloading valve and accumulator. The latter is much less 
expensive, and a suitable pump would be Denison 800 Series unit. 
This pump delivers 34.7 GPM at 3000 psi with an input power 
of 69.7 horsepower at 1200 rpm. Since the peak flow-rate into 
the servomotor during the specified sinusoid is 35.4 GPM, only 
a small volume is needed from the accumulator to supply the 
peak flow-rate. 

The outstanding disadvantage of this valve-controiled servo- 
mechanism is the high power consumption, resulting from the 
fact that a given angular movement of the servomotor requires a 
proportional volume of oil to be throttled through 3000 psi with 
a proportional energy loss, regardless of the load torque on the 
fin. For the specified sinusoid, the peak motor flow-rate is 35.4 
GPM, and so the flow-rate to the servo valve is the average of the 
“rectified sine-wave,”’ i.e., 2(35.4)/m or 22.5 GPM. This means 
that the pump must operate at 3000 psi with a duty cycle of 
22.5/34.7, ie., 65 per cent of the time. Thus for the specified 
sinusoid the average power consumption at the pump shaft (not 
allowing for motor losses) is 65 per cent times 69.7 or about 45 
horsepower, which is all dissipated in throttling the oil. A rather 
large heat exchanger must be provided. 


Design of Pump-Controlled Fin Servomechanism 


The high power losses of the valve-controlled system make it 
very desirable to investigate a solution with a variable-stroke 
pump to furnish oil to the motor, even though it will entail some 
additional complexity. 

Assume that the same motor is used, and that therefore the 
peak flow-rate will be 35.4 GPM during the specified sinusoid, 
and the peak pressure will be about 1760 psi at zero flow-rate. 

A possible choice of variable-displacement pump would be the 
Denison variable-volume, stem-control 800 Series, with a flow 
rating of 34.9 GPM at 1000 psi and rated speed. This flow-rating 
will compromise the maximum velocity of the servomotor, but 
only slightly. (Alternatively, to avoid this derating effect, a 
servomotor of somewhat smaller radian displacement could be 
chosen, so as to obtain higher velocity and still operate well be- 
low the 5000-psi pressure rating of the pump.) It is understood 
from the pump manufacturer that the effective mass at the stem 
(stroke-rod) is about 40 lb, and that the pressure forces in the 
pump cause a force that tends to restore the stem to neutral. 
This latter force is proportional to output pressure and is esti- 
mated as 200 to 250 lb at 2000 psi. Incidentally, the author be- 
lieves that the subject of stroking forces and the resulting interac- 
tion of stroker dynamics with pump-motor dynamics merit in- 
vestigation on a theoretical as well as experimental basis. 

In order to operate the pump in a stable fashion, a stroking 
servo is required which makes the stroke proportional to the fin 
error signal. This stroking servo could utilize a double-acting 
piston with an annular area of 1.0 in.*, a Raymond Atchley Model 
410 servo valve and a constant pilot pressure of 1000 psi, fur- 
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nished by an auxiliary pump and accumulator. This pump and 
a sequence valve also furnish the 50-psi supply. The open-loop 
transfer function of this stroking servo by itself will have an inte- 
gration and probably a resonance as in equation (24). Thus its 
closed-loop transfer function at low frequencies will be of the form 
given in equation (11). In order to achieve fast response and 
stability in the fin servo, it would be desirable to keep the phase 
lag less than 10 deg at the fin-servo crossover frequency of 8.9 
radian/sec. This would dictate a crossover frequency of at least 
50.5 radian/see in the stroke servo, which is far under the reson- 
ance due to the pump-hangar mass and stroking-cylinder com- 
pliances. Feedback around the stroking servo can be from a 
linear-variable-differential-transformer and no compensation is 
required. 

Since the same servomotor is being used in both the pump- 
controlled system and in the valve-controlled system, it would 
appear that the transfer function G,, in equation (24) is nearly 
unchanged, except for damping changes relating to the nearly flat 
flow-pressure curves of the pump. 

The higher efficiency of this type of servo results in a much 
lower power requirement. For instance, at a fin angle of —10 deg 
during the specified sinusoid, the velocity must be 0.775 radian/ 
sec and the load torque is — 150,000 lb-in. This calls for a motor 
flow-rate of 128.1 in*/see and a pressure drop of 905 psi, and a hy- 
draulic power input of 17.6 HP. From manufacturer's data, the 
efficiency of the pump in this pressure range is calculated as 80.5 
per cent, and so the shaft horsepower to the pump would be about 
21.8 HP. Further analysis shows that this is the peak power 
required by the pump, because the higher load torques are re- 
quired at lower velocities. Actually, about 4 HP peak would be 
required by the auxiliary pump. 

Thus the pump-controlled servo requires much less peak power 
from the ship than does the valve-controlled servo, and the cool- 
ing problem is greatly reduced. Furthermore, consideration of 
Fig. 2 shows that power will flow from the fin through the 
servomotor, pump, and drive motor into the ship’s electrical 
power system during part of the operation, thus reducing the 
average power drain markedly. This reverse flow of power is not 
possible with a valve-controlled servo, because the valve can func- 
tion only by dissipating energy in its orifices. 

The choice of fin servomechanism would depend on a number 
of factors, such as cost and complexity, even though power 
considerations favor the pump-controlled servo very strongly. 
The larger installations [1, 6, 7] always use pump-controlled fin 
servos, whereas many smaller installations [8, 9] have used valve- 
controlled servos. 
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DISCUSSION 
Peter Du Cane? 


The author has written a paper of great interest to engineers 
and naval architects involved in the problem of roll reduction. 
The language and method is that of a hydraulic and electronic 
servo-mechanisms engineer, on which latter subject the writer of 
this discussion does not feel competent to comment. However, 
where the author touches upon the hydrodynamic aspect there 
are one or two points worthy of mention, as well as some practical 
points arising from experience afloat with the Vosper roll damp- 
ing systems up to date. 

Referring to Fig. 1 the sea surface is shown as a straight line 
at an angle to the horizontal. This in the text is presumably in- 
tended to represent a wave surface causing a moment WGMT. 
Actually, of course, this so-called wave slope can only be repre- 
sented by making use of an expression to convert the sinusoidal or 
trochoidal wave form which more nearly approaches reality into 
an equivalent wave slope. The wave slope is arrived at by various 
means described in Ref. [14],* which is referred to as the “effec- 
tive wave slope’’ and is to a considerable degree frequency de- 
pendent. 

Although the steady-state hydrodynamic torque on the fin for 
stock at quarter chord is as Fig. 2, this does not in fact represent 
the actual time history of the torque which is required to move 
the fins under seagoing conditions. The negative range of torque 
is unlikely to be experienced owing to the dynamic considerations 
involved in accelerating the mass of the fins as well as the en- 
trained water. The greater the value of rotary acceleration of the 
fins the greater will be the transient torque and lift increment. 

It may be worth mentioning that by comparison with an in- 
stallation with retractable fins the value of the factor Fins on/Fins 
off will be less owing to the quite substantial damping effect of 
the fins even when static. 

A control signal from a spring restrained gyro is envisaged such 
that full fin angle of 30 deg is called for resulting from a roll rate 
of 0.43 deg/sec. It is usually considered that a control valve 
must be fitted to operate the hydraulic cylinders such that fin 
angle is proportional to rate signal. If, however, a roll rate as 
low as 0.43 deg/sec, or for that matter 0.86 deg/sec is required 
to produce full fin angle it can be understood that for fairly small 
ships a residual roll of this amount will be very likely to develop 
for a large part of the time. Therefore, the full fin angle will be 
called for almost whenever the gyro transmits a signal. The 
characteristic of the hydraulic control valve can therefore be ef- 
fectively of the ON-OFF type without detriment to performance 
so long as the cut off is not so severe as to cause an instability 

? Vosper Ltd. Portsmouth, England. 


* Numbers [14] and [15] designate References at end of this 
discussion. 
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(judder). That this characteristic can be incorporated satisfac- 
torily has been proved in the case of many Vosper installations. 
This is not to say the proportional control does not represent the 
optimum. 

The matter of the proposed feed forward has not often been 
used in practical operation and would seem to present difficulties if 
reliance is placed on pressure measurements at the hull. There 
are, for instance, velocity heads which will be picked up as well as 
differences in wave profile due to the longitudinal wave caused by 
speed through the water. 

It seems a possibility that the technique of using a “sonic’’ 
probe at the side might provide a useful signal proportional to 
wave height relative to the gunwale. 

This technique is used in controlling the foil incidence in the 
“SEA LEGS” hydrofoil, in this case the probe measures the dis- 
tance between stem and wave surface with the object of con- 
trolling pitch and heave. 

The comparison of vane motor with cylinders is of interest, but 
though attractive in certain ways there can exist an alignment 
problem where the fin stock connects with the shaft of the vane 
motor. The only way to avoid this would be to incorporate the 
vane motor in the actual fin shaft in which case it would be 
necessary to arrange for flexibility in the hydraulic supply pipes. 
Also some difficulty will probably be experienced if wear takes 
place in the bearings in way of the bottom plating of the ship. 
The Houdaille motor specified is interesting in that it can operate 
at 3000 psi which is unusual for this type owing to leakage at the 
vane tips. 

Though the author is worried about the compliance in flexible 
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hoses we have not in fact experienced trouble in this direction up 
to date, although we have used them extensively. 

The merits of the pump-controlled servo mechanism as against 
the valve controlled mechanism as set out are important and 
would seem to depend as the author states on economic factors. 

The valve controlled system can use a simpler type of pump 
and in conjunction with accumulators can reduce pump duty. 
However, much will depend on how important it is to save hy- 
draulic pumping power. There are, for instance, cases in small 
vessels where the hydraulic pump is driven from the main engines 
by a belt. For this case a little power at peak loads more or less 
is unlikely to be noticed in the year end reckoning. 

Another way of saving power is to fit a number of small fins 
totalling the fin power to develop the torque required. 

Due to the fact that hinge moment is proportional to (chord)? 
it can be understood that there is a substantial reduction in total 
torque required where a number of fins are fitted each with re- 
duced chord. 

It is hoped these notes will be of some interest. 
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Author's Closure 
The author appreciates the interesting discussion of Commander 
Du Cane and concurs with most of his views. On the other hand, 
it would seem that an “‘on-off’’ control valve will probably cause 
severe problems of instability. 
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